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Abstract 



We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a sim- 
ple temporal logic and use the resulting algorithm to learn visual event definitions from video 
sequences. First, we introduce a simple, prepositional, temporal, event-description language called 
AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support 
learning. We then give algorithms, along with lower and upper complexity bounds, for the sub- 
sumption and generalization problems for AMA formulas. We present a positive-examples-only 
specific-to-general learning method based on these algorithms. We also present a polynomial- 
time-computable "syntactic" subsumption test that imphes semantic subsumption without being 
equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of 
semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. 
Finally, we apply this algorithm to the task of learning relational event definitions from video and 
show that it yields definitions that are competitive with hand-coded ones. 

1. Introduction 

Humans conceptualize the world in terms of objects and events. This is reflected in the fact that 
we talk about the world using nouns and verbs. We perceive events taking place between objects, 
we interact with the world by performing events on objects, and we reason about the effects that 
actual and hypothetical events performed by us and others have on objects. We also learn new 
object and event types from novel experience. In this paper, we present and evaluate novel imple- 
mented techniques that allow a computer to learn new event types from examples. We show results 
from an application of these techniques to learning new event types from automatically constructed 
relational, force-dynamic descriptions of video sequences. 

We wish the acquired knowledge of event types to support multiple modalities. Humans can 
observe someone faxing a letter for the first time and quickly be able to recognize future occurrences 
of faxing, perform faxing, and reason about faxing. It thus appears likely that humans use and 
learn event representations that are sufficiently general to support fast and efficient use in multiple 
modalities. A long-term goal of our research is to allow similar cross-modal learning and use of 
event representations. We intend the same learned representations to be used for vision (as described 
in this paper), planning (something that we are beginning to investigate), and robotics (something 
left to the future). 

A crucial requirement for event representations is that they capture the invariants of an event 
type. Humans classify both picking up a cup off a table and picking up a dumbbell off the floor 
as picking up. This suggests that human event representations are relational. We have an abstract 
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relational notion of picking up that is parameterized by the participant objects rather than distinct 
propositional notions instantiated for specific objects. Humans also classify an event as picking 
up no matter whether the hand is moving slowly or quickly, horizontally or vertically, leftward or 
rightward, or along a straight path or circuitous one. It appears that it is not the characteristics of 
participant-object motion that distinguish picking up from other event types. Rather, it is the fact 
that the object being picked up changes from being supported by resting on its initial location to 
being supported by being grasped by the agent. This suggests that the primitive relations used to 
build event representations dXQ force dynamic (Talmy, 1988). 

Another desirable property of event representations is that they be perspicuous. Humans can 
introspect and describe the defining characteristics of event types. Such introspection is what al- 
lows us to create dictionaries. To support such introspection, we prefer a representation language 
that allows such characteristics to be exphcitly manifest in event definitions and not emergent con- 
sequences of distributed parameters as in neural networks or hidden Markov models. 

We develop a supervised learner for an event representation possessing these desired charac- 
teristics as follows. First, we present a simple, propositional, temporal logic called AMA that is a 
sublanguage of a variety of familiar temporal languages (e.g. linear temporal logic, or LTL Bac- 
chus & Kabanza, 2000, event logic Siskind, 2001). This logic is expressive enough to describe a 
variety of interesting temporal events, but restrictive enough to support an effective learner, as we 
demonstrate below. We proceed to develop a specific-to-general learner for the AMA logic by giv- 
ing algorithms and complexity bounds for the subsumption and generalization problems involving 
AMA formulas. While we show that semantic subsumption is intractable, we provide a weaker syn- 
tactic notion of subsumption that implies semantic subsumption but can be checked in polynomial 
time. Our implemented learner is based upon this syntactic subsumption. 

We next show means to adapt this (propositional) AMA learner to learn relational concepts. 
We evaluate the resulting relational learner in a complete system for learning force-dynamic event 
definitions from positive-only training examples given as real video sequences. This is not the first 
system to perform visual-event recognition from video. We review prior work and compare it to 
the current work later in the paper. In fact, two such prior systems have been built by one of the 
authors. Howard (Siskind & Morris, 1996) learns to classify events from video using temporal, 
relational representations. But these representations are not force dynamic. LEONARD (Siskind, 
2001) classifies events from video using temporal, relational, force-dynamic representations but 
does not learn these representations. It uses a hbrary of hand-code representations. This work adds 
a learning component to LEONARD, essentially duplicating the performance of the hand-coded 
definitions automatically. 

While we have demonstrated the utility of our learner in the visual-event-learning domain, we 
note that there are many domains where interesting concepts take the form of structured tempo- 
ral sequences of events. In machine planning, macro-actions represent useful temporal patterns of 
action. In computer security, typical application behavior, represented perhaps as temporal pat- 
terns of system calls, must be differentiated from compromised appUcation behavior (and likewise 
authorized-user behavior from intrusive behavior). 

In what follows. Section 2 introduces our application domain of recognizing visual events and 
provides an informal description of our system for learning event definitions from video. Section 3 
introduces the AMA language, syntax and semantics, and several concepts needed in our analysis 
of the language. Section 4 develops and analyzes algorithms for the subsumption and generalization 
problems in the language, and introduces the more practical notion of syntactic subsumption. Sec- 
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tion 5 extends the basic prepositional learner to handle relational data and negation, and to control 
exponential run-time growth. Section 6 presents our results on visual-event learning. Sections 7 
and 8 compare to related work and conclude. 

2. System Overview 

This section provides an overview of our system for learning to recognize visual events from video. 
The aim is to provide an intuitive picture of our system before providing technical details. A formal 
presentation of our event-description language, algorithms, and both theoretical and empirical re- 
sults appears in Sections 3-6. We first introduce the application domain of visual-event recognition 
and the Leonard system, the event recognizer upon which our learner is built. Second, we describe 
how our positive-only learner fits into the overall system. Third, we informally introduce the AMA 
event-description language that is used by our learner. Finally, we give an informal presentation of 
the learning algorithm. 

2.1 Recognizing Visual Events 

Leonard (Siskind, 2001) is a system for recognizing visual events from video camera input — 
an example of a simple visual event is "a hand picking up a block." This research was originally 
motivated by the problem of adding a learning component to LEONARD — allowing LEONARD to 
learn to recognize an event by viewing example events of the same type. Below, we give a high-level 
description of the Leonard system. 

Leonard is a three-stage pipeline depicted in Figure 1. The raw input consists of a video-frame 
image sequence depicting events. First, a segmentation-and-tracking component transforms this 
input into a polygon movie: a sequence of frames, each frame being a set of convex polygons placed 
around the tracked objects in the video. Figure 2a shows a partial video sequence of a pick up event 
that is overlaid with the corresponding polygon movie. Next, a model-reconstruction component 
transforms the polygon movie into a force-dynamic model. This model describes the changing 
support, contact, and attachment relations between the tracked objects over time. Constructing 
this model is a somewhat involved process as described in Siskind (2000). Figure 2b shows a 
visual depiction of the force-dynamic model corresponding to the pick up event. Finally, an event- 
recognition component armed with a library of event definitions determines which events occurred 
in the model and, accordingly, in the video. Figure 2c shows the text output and input of the 
event-recognizer for the pick up event. The first Une corresponds to the output which indicates 
the interval(s) during which a pick up occurred. The remaining lines are the text encoding of the 
event-recognizer input (model-reconstruction output), indicating the time intervals in which various 
force-dynamic relations are true in the video. 

The event-recognition component of LEONARD represents event types with event-logic formu- 
las like the following simplified example, representing x picking up y off of 2; . 

PiCK\Jp{x,y,z) = (Supports (2;, y) A Contacts (2;,?/)); (Supports (a;, y) A Attached (a;, y)) 

This formula asserts that an event of x picking up y off of z is defined as a sequence of two states 
where z supports y by way of contact in the first state and x supports y by way of attachment in 
the second state. Supports, Contacts, and Attached are primitive force-dynamic relations. 
This formula is a specific example of the more general class of AMA formulas that we use in our 
learning. 
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Figure 1: The upper boxes represent the three primary components of Leonard's pipeUne. The 
lower box depicts the event-learning component described in this paper. The input to the 
learning component consists of training models of target events (e.g., movies of pick up 
events) along with event labels (e.g., PiCKUp(hand, red, green)) and the output is an 
event definition (e.g., a temporal logic formula defining PlCKUp(a;, y, z)). 



2.2 Adding a Learning Component 

Prior to the work reported in this paper, the definitions in Leonard's event-recognition library 
were hand coded. Here, we add a learning component to Leonard so that it can learn to recognize 
events. Figure 1 shows how the event learner fits into the overall system. The input to the event 
learner consists of force-dynamic models from the model-reconstruction stage, along with event 
labels, and its output consists of event definitions which are used by the event recognizer. We take 
a supervised-leaming approach where the force-dynamic model-reconstruction process is applied 
to training videos of a target event type. The resulting force-dynamic models along with labels 
indicating the target event type are then given to the learner which induces a candidate definition of 
the event type. 

For example, the input to our learner might consist of two models corresponding to two videos, 
one of a hand picking up a red block off of a green block with label PlCKUp(hand, red, green) and 
one of a hand picking up a green block off of a red block with label PlCKUp(hand, green, red) — the 
output would be a candidate definition of PlCKUp(a;, y, z) that is appHcable to previously unseen 
pick up events. Note that our learning component is positive-only in the sense that when learning 
a target event type it uses only positive training examples (where the target event occurs) and does 
not use negative examples (where the target event does not occur). The positive-only setting is of 
interest as it appears that humans are able to learn many event definitions given primarily or only 
positive examples. From a practical standpoint, a positive-only learner removes the often difficult 
task of collecting negative examples that are representative of what is not the event to be learned 
(e.g., what is a typical "non-pickup" event?). 

The construction of our learner involves two primary design choices. First, we must choose an 
event representation language to serve as the learner's hypothesis space (i.e., the space of event def- 
initions it may output). Second, we must design an algorithm for selecting a "good" event definition 
from the hypothesis space given a set of training examples of an event type. 

2.3 The AMA Hypothesis Space 

The full event logic supported by Leonard is quite expressive, allowing the specification of a 
wide variety of temporal patterns (formulas). To help support successful learning, we use a more 
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Figure 2: LEONARD recognizes a pick up event, (a) Frames from the raw video input with the auto- 
matically generated polygon movie overlaid, (b) The same frames with a visual depiction 
of the automatically generated force-dynamic properties, (c) The text input/output of the 
event classifier corresponding to the depicted movie. The top line is the output and the 
remaining lines make up the input that encodes the changing force-dynamic properties. 
GREEN represents the block on the table and RED represents the block being picked up. 
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restrictive subset of event logic, called AMA, as our learner's hypothesis space. This subset excludes 
many practically useless formulas that may "confuse" the learner, while still retaining substantial 
expressiveness, thus allowing us to represent and learn many useful event types. Our restriction to 
AMA formulas is a form of syntactic learning bias. 

The most basic AMA formulas are called states which express constant properties of time inter- 
vals of arbitrary duration. For example, SUPPORTS [z, y) A CONTACTS (z, y) is a state which tells us 
that z must support and be in contact with y. In general, a state can be the conjunction of any number 
of primitive propositions (in this case force-dynamic relations). Using AMA we can also describe 
sequences of states. For example, (SUPPORTS (2;, y) A CONTACTS (2;, y)) ; (SUPPORTS (a;, y) A 
Attached (a:, y)) is a sequence of two states, with the first state as given above and the second 
state indicating that x must support and be attached to y. This formula is true whenever the first 
state is true for some time interval, followed immediately by the second state being true for some 
time interval "meeting" the first time interval. Such sequences are called MA timelines since they 
are the Meets of Ands. In general, MA timelines can contain any number of states. Finally, we can 
conjoin MA timelines to get AMA formulas (Ands of MA's). For example, the AMA formula 

[(Supports (2;, y) A Contacts (;2,y)) ; (Supports (a;, y) A Attached (a;, y))] A 

[(Supports (u,u) a Attached (?/,■!;)) ; (Supports (-«;,?;) a Contacts («;,!)))] 

defines an event where two MA timelines must be true simultaneously over the same time interval. 
Using AMA formulas we can represent events by listing various property sequences (MA timelines), 
all of which must occur in parallel as an event unfolds. It is important to note, however, that the 
transitions between states of different timelines in an AMA formula can occur in any relation to one 
another For example, in the above AMA formula, the transition between the two states of the first 
timeline can occur before, after, or exactly at the transition between states of the second timeline. 

An important assumption leveraged by our learner is that the primitive propositions used to con- 
struct states describe liquid properties (Shoham, 1987). For our purposes, we say that a property is 
liquid if when it holds over a time-interval it holds over all of its subintervals. The force-dynamic 
properties produced by Leonard are liquid — e.g., if a hand Supports a block over an interval 
then clearly the hand supports the block over all subintervals. Because primitive propositions are 
liquid, properties described by states (conjunctions of primitives) are also liquid. However, proper- 
ties described by MA and AMA formulas are not, in general, liquid. 

2.4 Specific-to-General Learning from Positive Data 

Recall that the examples that we wish to classify and learn from are force-dynamic models, which 
can be thought of (and are derived from) movies depicting temporal events. Also recall that our 
learner outputs definitions from the AMA hypothesis space. Given an AMA formula, we say that 
it covers an example model if it is true in that model. For a particular target event type (such as 
PickUp), the ultimate goal is for the learner to output an AMA formula that covers an example 
model if and only if the model depicts an instance of the target event type. To understand our 
learner, it is useful to define a generality relationship between AMA formulas. We say that AMA 
formula *i is more general (less specific) than AMA formula ^2 if and only if $2 covers every 
example that *i covers (and possibly more).^ 

1. In our formal analysis, we will use two different notions of generality (semantic and syntactic). In this section, we 
ignore such distinctions. We note, however, that the algorithm we informally describe later in this section is based on 
the syntactic notion of generality. 
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If the only learning goal is to find an AMA formula that is consistent with a set of positive- 
only training data, then one result can be the trivial solution of returning the formula that covers 
all examples. Rather than fix this problem by adding negative training examples (which will rule 
out the trivial solution), we instead change the learning goal to be that of finding the least-general 
formula that covers all of the positive examples.^ This learning approach has been pursued for a 
variety of different languages within the machine-learning literature, including clausal first-order 
logic (Plotkin, 1971), definite clauses (Muggleton & Feng, 1992), and description logic (Cohen & 
Hirsh, 1994). It is important to choose an appropriate hypothesis space as a bias for this learning 
approach or the hypothesis returned may simply be (or resemble) one of two extremes, either the 
disjunction of the training examples or the universal hypothesis that covers all examples. In our 
experiments, we have found that, with enough training data, the least-general AMA formula often 
converges usefully. 

We take a standard specific-to-general machine-learning approach to finding the least-general 
AMA formula that covers a set of positive examples. The approach relies on the computation of two 
functions: the least-general covering formula (LGCF) of an example model and the least-general 
generalization (LGG) of a set of AMA formulas. The LGCF of an example model is the least general 
AMA formula that covers the example. Intuitively, the LGCF is the AMA formula that captures the 
most information about the model. The LGG of any set of AMA formulas is the least-general AMA 
formula that is more general than each formula in the set. Intuitively, the LGG of a formula set is 
the AMA formula that captures the largest amount of common information among the formulas. 
Viewed differently, the LGG of a formula set covers all of the examples covered by those formulas, 
but covers as few other examples as possible (while remaining in AMA).^ 

The resulting specific-to-general learning approach proceeds as follows. First, use the LGCF 
function to transform each positive training model into an AMA formula. Second, return the LGG 
of the resulting formulas. The result represents the least-general AMA formula that covers all of 
the positive training examples. Thus, to specify our learner, all that remains is to provide algo- 
rithms for computing the LGCF and LGG for the AMA language. Below we informally describe 
our algorithms for computing these functions, which are formally derived and analyzed in Sec- 
tions 3.4 and 4. 

2.5 Computing the AMA LGCF 

To increase the readability of our presentation, in what follows, we dispense with presenting exam- 
ples where the primitive properties are meaningfully named force-dynamic relations. Rather, our 
examples will utilize abstract propositions such as a and b. In our current application, these propo- 
sitions correspond exclusively to force-dynamic properties, but may not for other applications. We 
now demonstrate how our system computes the LGCF of an example model. 

Consider the following example model: {a@[l,4],b@[3, 6], c@[6, 6], 3], d@[5, 6]}. Here, 
we take each number (1, . . . , 6) to represent a time interval of arbitrary (possibly varying with the 
number) duration during which nothing changes, and then each fact p@[i, j] indicates that propo- 
sition p is continuously true throughout the time intervals numbered i through j. This model can 
be depicted graphically, as shown in Figure 3. The top four lines in the figure indicate the time 

2. This avoids the need for negative examples and corresponds to finding the specific boundary of the version space 
(Mitchell, 1982). 

3. The existence and uniqueness of the LGCF and LGG defined here is a formal property of the hypothesis space and is 
proven for AMA in Sections 3.4 and 4, respectively. 
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b b b 



d d d 



aAd ;aA6Ac?; aAb ; bAd \bAcAd 

Figure 3: LGCF Computation. The top four liorizontal lines of the figure indicate the in- 
tervals over which the propositions a, 6, c and d are true in the model given by 
{a@[l, 4], 6@[3, 6], c@[6, 6], d@[l, 3], d@[5, 6]}. The bottom line shows how the model 
can be divided into intervals where no transitions occur The LGCF is an MA timeline, 
shown at the bottom of the figure, with a state for each of the no-transition intervals. Each 
state simply contains the true propositions within the corresponding interval. 



intervals over which each of the propositions a, 6, c, and d are ttue in the model. The bottom line 
in the figure shows how the model can be divided into five time intervals where no propositions 
change truth value. This division is possible because of the assumption that our propositions are 
liquid. This allows us, for example, to break up the time-interval where a is true into three consec- 
utive subintervals where a is true. After dividing the model into intervals with no transitions, we 
compute the LGCF by simply treating each of those intervals as a state of an MA timeline, where 
the states contain only those propositions that are true during the corresponding time interval. The 
resulting five-state MA timeline is shown at the bottom of the figure. We show later that this simple 
computation returns the LGCF for any model. Thus, we see that the LGCF of a model is always an 
MA timeline. 

2.6 Computing the AMA LGG 

We now describe our algorithm for computing the LGG of two AMA formulas — the LGG of m 
formulas can be computed via a sequence of m — 1 pairwise LGG applications, as discussed later 
Consider the two MA timelines: $i = {aAbAc)\{bAcAd)\e and $2 = {a Ab Ae)\a\{e Ad). 
It is useful to consider the various ways in which both timelines can be true simultaneously along 
an arbitrary time interval. To do this, we look at the various ways in which the two timelines 
can be aligned along a time interval. Figure 4a shows one of the many possible alignments of 
these timehnes. We call such alignments interdigitations — in general, there are exponentially many 
interdigitations, each one ordering the state transitions differently. Note that an interdigitation is 
allowed to constrain two transitions from different timelines to occur simultaneously (though this is 
not depicted in the figure).'* 

4. Thus, an interdigitation provides an "ordering" relation on transitions that need not be anti-symmetric, but is reflexive, 
transitive, and total. 
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Figure 4: Generalizing the MA timelines {a A b A c); {b A c A d); e and (a A 6 A e); a; (e A c?). (a) 
One of the exponentially many interdigitations of the two timelines, (b) Computing the 
interdigitation generalization corresponding to the interdigitation from part (a). States are 
formed by intersecting aligned states from the two timeUnes. The state true represents a 
state with no propositions. 



Given an interdigitation of two timelines, it is easy to construct a new MA timeline that must be 
true whenever either of the timelines is true (i.e., to construct a generalization of the two timelines). 
In Figure 4b, we give this construction for the interdigitation given in Figure 4a. The top two 
horizontal lines in the figure correspond to the interdigitation, only here we have divided every state 
on either timeline into two identical states, whenever a transition occurs during that state in the other 
timeline. The resulting pair of timelines have only simultaneous transitions and can be viewed as 
a sequence of state pairs, one from each timeUne. The bottom horizontal line is then labeled by 
an MA timeline with one state for each such state pair, with that state being the intersection of the 
proposition sets in the state pair. Here, true represents the empty set of propositions, and is a state 
that is true anywhere. 

We call the resulting timeline an interdigitation generalization f/Gj of $i and $2- It should be 
clear that this IG will be true whenever either $i or $2 are true. In particular, if $1 holds along a 
time-interval in a model, then there is a sequence of consecutive (meeting) subintervals where the 
sequence of states in $1 are true. By construction, the IG can be ahgned relative to $1 along the 
interval so that when we view states as sets, the states in the IG are subsets of the corresponding 
aUgned state(s) in $1. Thus, the IG states are all true in the model under the aUgnment, showing 
that the IG is true in the model. 

In general, there are exponentially many IGs of two input MA timelines, one for each possible 
interdigitation between the two. Clearly, since each IG is a generalization of the input timelines, 
then so is the conjunction of all the IGs. This conjunction is an AMA formula that generaUzes the 
input MA timelines. In fact, we show later in the paper that this AMA formula is the LGG of the 
two timehnes. Below we show the conjunction of all the IGs of $1 and $2 which serves as their 
LGG. 
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While this formula is an LGG, it contains redundant timeUnes that can be pruned. First, it is 
clear that different IGs can result in the same MA timeUnes, and we can remove all but one copy 
of each timeline from the LGG. Second, note that if a timeline is more general than a timeline 

then $ A is equivalent to $ — thus, we can prune away timelines that are generalizations of 
others. Later in the paper, we show how to efficiently test whether one timeline is more general 
than another. After performing these pruning steps, we are left with only the first and next to last 
timelines in the above formula — thus, [(a A 6); a; d; e] A [(a A b)\ b; e; true; e] is an LGG of $i and 
$2- 

We have demonstrated how to compute the LGG of pairs of MA timelines. We can use this 
procedure to compute the LGG of pairs of AMA formulas. Given two AMA formulas we compute 
their LGG by simply conjoining the LGGs of all pairs of timelines (one from each AMA formula) — 
i.e., the formula 

m n 

/\/\LGG($,,$;.) 

'* J 

is an LGG of the two AMA formulas $i A • • • A $m and $i A • • • A where the $j and are 
MA timelines. 

We have now informally described the LGCF and LGG operations needed to carry out the 
specific-to-general learning approach described above. In what follows, we more formally develop 
these operations and analyze the theoretical properties of the corresponding problems, then discuss 
the needed extensions to bring these (exponential, propositional, and negation-free) operations to 
practice. 

3. Representing Events with AMA 

Here we present a formal account of the AMA hypothesis space and an analytical development of the 
algorithms needed for specific-to-general learning for AMA. Readers that are primarily interested in 
a high-level view of the algorithms and their empirical evaluation may wish to skip Sections 3 and 4 
and instead proceed directly to Sections 5 and 6, where we discuss several practical extensions to 
the basic learner and then present our empirical evaluation. 

We study a subset of an interval-based logic called event logic (Siskind, 2001) utilized by 
Leonard for event recognition in video sequences. This logic is interval-based in explicitly rep- 
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resenting each of the possible interval relationships given originally by Allen (1983) in his calculus 
of interval relations (e.g., "overlaps," "meets," "during"). Event-logic formulas allow the definition 
of event types which can specify static properties of intervals directly and dynamic properties by 
hierarchically relating sub-intervals using the Allen relations. In this paper, the formal syntax and 
semantics of full event logic are needed only for Proposition 4 and are given in Appendix A. 

Here we restrict our attention to a much simpler subset of event logic we call AMA, defined 
below. We beheve that our choice of event logic rather than first-order logic, as well as our restriction 
to the AMA fragment of event logic, provide a useful learning bias by ruling out a large number of 
"practically useless" concepts while maintaining substantial expressive power. The practical utility 
of this bias is demonstrated via our empirical results in the visual-event-recognition application. 
AMA can also be seen as a restriction of LTL (Bacchus & Kabanza, 2000) to conjunction and 
"Until," with similar motivations. Below we present the syntax and semantics of AMA along with 
some of the key technical properties of AMA that will be used throughout this paper. 

3.1 AMA Syntax and Semantics 

It is natural to describe temporal events by specifying a sequence of properties that must hold over 
consecutive time intervals. For example, "a hand picking up a block" might become "the block 
is not supported by the hand and then the block is supported by the hand." We represent such 
sequences with MA timelines^, which are sequences of conjunctive state restrictions. Intuitively, an 
MA timeline is given by a sequence of propositional conjunctions, separated by semicolons, and is 
taken to represent the set of events that temporally match the sequence of consecutive conjunctions. 
An AMA formula is then the conjunction of a number of MA timelines, representing events that 
can be simultaneously viewed as satisfying each of the conjoined timelines. Formally, the syntax of 
AMA formulas is given by, 

state ::= true | pwp \ prop A state 
MA ::= {state) \ {state); MA // may omit parens 
AMA ::= MA\MAAAMA 

where prop is any primitive proposition (sometimes called a primitive event type). We take this 
grammar to formally define the terms MA timeline, MA formula, AMA formula, and state. A k- 
MA formula is an MA formula with at most k states, and a A;- AMA formula is an AMA formula 
all of whose MA timelines are A;-MA timelines. We often treat states as proposition sets with 
true the empty set and AMA formulas as MA-timeline sets. We may also treat MA formulas as 
sets of states — ^it is important to note, however, that MA formulas may contain duplicate states, 
and the duplication can be significant. For this reason, when treating MA timelines as sets, we 
formally intend sets of state-index pairs (where the index gives a states position in the formula). 
We do not indicate this explicitly to avoid encumbering our notation, but the implicit index must be 
remembered whenever handling duplicate states. 

The semantics of AMA formulas is defined in terms of temporal models. A temporal model 
M = {M, I) over the set PROP of propositions is a pair of a mapping M from the natural numbers 
(representing time) to the truth assignments over PROP, and a closed natural-number interval /. 
We note that Siskind (2001) gives a continuous-time semantics for event logic where the models 

5. MA stands for "Meets/ And," an MA timeline being the "Meet" of a sequence of conjunctively restricted intervals. 
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are defined in terms of real-valued time intervals. The temporal models defined here use discrete 
natural-number time-indices. However, our results here still apply under the continuous-time se- 
mantics. (That semantics bounds the number of state changes in the continuous timeline to a count- 
able number.) It is important to note that the natural numbers in the domain of M are representing 
time discretely, but that there is no prescribed unit of continuous time represented by each natural 
number. Instead, each number represents an arbitrarily long period of continuous time during which 
nothing changed. Similarly, the states in our MA timelines represent arbitrarily long periods of time 
during which the conjunctive restriction given by the state holds. The satisfiability relation for AMA 
formulas is given as follows: 

• A state s is satisfied by a model (M, I) iff M[x] assigns P true for every x ^ I and Pes. 

• An MA timeline ,si: .§2; • • • ; is satisfied by a model (M, [t, t']) iff there exists some t" 
in [t,t'] such that (M, [t,t"]) satisfies si and either (M, [t",t']) or (M, [t" + satisfies 

• An AMA formula $1 A $2 A ■ ■ ■ A is satisfied by M iff each is satisfied by M. 

The condition defining satisfaction for MA timeUnes may appear unintuitive at first due to the 
fact that there are two ways that S2; ■ ■ .;Sn can be satisfied. The reason for this becomes clear by re- 
calling that we are using the natural numbers to represent continuous time intervals. Intuitively, from 
a continuous-time perspective, an MA timeline is satisfied if there are consecutive continuous-time 
intervals satisfying the sequence of consecutive states of the MA timeline. The transition between 
consecutive states S j and ,S j+i can occur either within an interval of constant truth assignment (that 
happens to satisfy both states) or exactly at the boundary of two time intervals of constant truth 
value. In the above definition, these cases correspond to S2; • • • ; Sn being satisfied during the time 
intervals [t", t'] and [t" + l,t'] respectively. 

When M satisfies $ we say that M is a model of $ or that $ covers M. We say that AMA ^1 
subsumes AMA ^2 iff every model of ^2 is a model of ^1, written ^2 < "^i, and we say that ^1 
properly subsumes \1'2, written *2 < when we also have \l'i ^ \1'2. Alternatively, we may state 
*2 < *i by saying that *i is more general (or less specific) than *2 or that *i covers *2- Siskind 
(2001) provides a method to determine whether a given model satisfies a given AMA formula. 

Finally, it will be useful to associate a distinguished MA timeline to a model. The MA projection 
of a model M = (M, [i, j]) written as MAP(A^) is an MA timeline sq; si; • • • ; Sj-j where state Sk 
gives the true propositions in M{i + k) for < k < j — i. Intuitively, the MA projection gives 
the sequence of propositional truth assignments from the beginning to the end of the model. Later 
we show that the MA projection of a model can be viewed as representing that model in a precise 
sense. 

The following two examples illustrate some basic behaviors of AMA formulas: 

Example 1 (Stretchability). 5i ; ^2 ; ^3, 5i ; S2 ; S2 ; . . . ; ^2 ; 83, and Si ; Si : Si ; S2 ; ^3 ; ^3 ; ^3 are 
all equivalent MA timelines. In general, MA timelines have the property that duplicating any state 
results in a formula equivalent to the original formula. Recall that, given a model (M, /), we 
view each truth assignment M[x\ as representing a continuous time-interval. This interval can 
conceptually be divided into an arbitrary number of subintervals. Thus if state S is satisfied by 
{M, [a;, x]), then so is the state sequence S; S; . . . ; S. 
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Example 2 (Infinite Descending Chains). Given propositions A and B, the MA timeline $ = 

{A A B) is subsumed by each of the formulas A; B, A; B; A; B, A; B; A; B; A; B, This is 

intuitively clear when our semantics are viewed from a continuous-time perspective. Any interval 
in which both A and B are true can be broken up into an arbitrary number of subintervals where 
both A and B hold. This example illustrates that there can be infinite descending chains ofAMA 
formulas where the entire chain subsumes a given formula (hut no member is equivalent to the given 
formula). In general, any AMA formula involving only the propositions A and B will subsume 

3.2 Motivation for AMA 

MA timelines are a very natural way to capture stretchable sequences of state constraints. But 
why consider the conjunction of such sequences, i.e., AMA? We have several reasons for this lan- 
guage enrichment. First of all, we show below that the AMA least-general generalization (LGG) 
is unique — this is not true for MA. Second, and more informally, we argue that parallel conjunc- 
tive constraints can be important to learning efficiency. In particular, the space of MA formulas 
of length k grows in size exponentially with k, making it difficult to induce long MA formulas. 
However, finding several shorter MA timelines that each characterize part of a long sequence of 
changes is exponentially easier. (At least, the space to search is exponentially smaller.) The AMA 
conjunction of these timelines places these shorter constraints simultaneously and often captures a 
great deal of the concept structure. For this reason, we analyze AMA as well as MA and, in our 
empirical work, we consider A;-AMA. 

The AMA language is propositional. But our intended apphcations are relational, or first-order, 
including visual-event recognition. Later in this paper, we show that the propositional AMA learn- 
ing algorithms that we develop can be effectively appUed in relational domains. Our approach to 
first-order learning is distinctive in automatically constructing an object correspondence across ex- 
amples (cf. Lavrac, Dzeroski, & Grobelnik, 1991; Roth & Yih, 2001). Similarly, though AMA 
does not allow for negative state constraints, in Section 5.4 we discuss how to extend our results to 
incorporate negation into our learning algorithms, which is crucial in visual-event recognition. 

3.3 Conversion to First-Order Clauses 

We note that AMA formulas can be translated in various ways into first-order clauses. It is not 
straightforward, however, to then use existing clausal generalization techniques for learning. In 
particular, to capture the AMA semantics in clauses, it appears necessary to define subsumption and 
generalization relative to a background theory that restricts us to a "continuous-time" first-order- 
model space. 

For example, consider the AMA formulas $1 = A f\ B and $2 = -B where A and B are 
propositions — from Example 2 we know that $1 < $2- Now, consider a straightforward clausal 
translation of these formulas giving Ci = A{I)^B{I) andC2 = ^(/i) A5(/2) AMeets(/i, 72) A 
I = Span(Ji, I2), where the I and Ij are variables that represent time intervals, MEETS indicates 
that two time intervals meet each other, and Span is a function that returns a time interval equal 
to the union of its two time-interval arguments. The meaning we intend to capture is for satisfying 
assignments of I in Ci and C2 to indicate intervals over which $1 and $2 are satisfied, respectively. 
It should be clear that, contrary to what we want, Ci ^ C2 (i.e., ^ Ci — )■ C2), since it is easy to 
find unintended first-order models that satisfy Ci, but not C2. Thus such a translation, and other 
similar translations, do not capture the continuous-time nature of the AMA semantics. 
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In order to capture the AMA semantics in a clausal setting, one might define a first-order theory 
that restricts us to continuous-time models — for example, allowing for the derivation "if property B 
holds over an interval, then that property also holds over all sub-intervals." Given such a theory S, 
we have that S |= Ci — )• C2, as desired. However, it is well known that least-general generaliza- 
tions relative to such background theories need not exist (Plotkin, 1971), so prior work on clausal 
generalization does not simply subsume our results for the AMA language. 

We note that for a particular training set, it may be possible to compile a continuous-time back- 
ground theory S into a finite but adequate set of ground facts. Relative to such ground theories, 
clausal LGGs are known to always exist and thus could be used for our appUcation. However, 
the only such compiling approaches that look promising to us require exploiting an analysis sim- 
ilar to the one given in this paper — i.e., understanding the AMA generahzation and subsumption 
problem separately from clausal generahzation and exploiting that understanding in compiling the 
background theory. We have not pursued such compilations further. 

Even if we are given such a compilation procedure, there are other problems with using exist- 
ing clausal generalization techniques for learning AMA formulas. For the clausal translations of 
AMA we have found, the resulting generalizations typically fall outside of the (clausal translations 
of formulas in the) AMA language, so that the language bias of AMA is lost. In prehminary empir- 
ical work in our video-event recognition domain using clausal inductive-logic -programming (ILP) 
systems, we found that the learner appeared to lack the necessary language bias to find effective 
event definitions. While we beheve that it would be possible to find ways to build this language bias 
into ILP systems, we chose instead to define and learn within the desired language bias directly, by 
defining the class of AMA formulas, and studying the generahzation operation on that class. 

3.4 Basic Concepts and Properties of AMA 

We use the following convention in naming our results: "propositions" and "theorems" are the key 
results of our work, with theorems being those results of the most technical difficulty, and "lemmas" 
are technical results needed for the later proofs of propositions or theorems. We number all the 
results in one sequence, regardless of type. Proofs of theorems and propositions are provided in the 
main text — omitted proofs of lemmas are provided in the appendix. 

We give pseudo-code for our methods in a non-deterministic style. In a non-deterministic lan- 
guage functions can return more than one value non-deterministically, either because they contain 
non-deterministic choice points, or because they call other non-deterministic functions. Since a non- 
deterministic function can return more than one possible value, depending on the choices made at 
the choice points encountered, specifying such a function is a natural way to specify a richly struc- 
tured set (if the function has no arguments) or relation (if the function has arguments). To actually 
enumerate the values of the set (or the relation, once arguments are provided) one can simply use 
a standard backtracking search over the different possible computations corresponding to different 
choices at the choice points. 

3.4. 1 Subsumption and Generalization for States 

The most basic formulas we deal with are states (conjunctions of propositions). In our propositional 
setting computing subsumption and generalization at the state level is straightforward. A state Si 
subsumes S2 (S2 < Si) iff 5i is a subset of S2, viewing states as sets of propositions. From this, we 
derive that the intersection of states is the least-general subsumer of those states and that the union 
of states is likewise the most general subsumee. 
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3.4.2 INTERDIGITATIONS 

Given a set of MA timelines, we need to consider the different ways in which a model could si- 
multaneously satisfy the timelines in the set. At the start of such a model (i.e., the first time point), 
the initial state from each timeline must be satisfied. At some time point in the model, one or more 
of the timelines can transition so that the second state in those timeUnes must be satisfied in place 
of the initial state, while the initial state of the other timelines remains satisfied. After a sequence 
of such transitions in subsets of the timelines, the final state of each timeline holds. Each way of 
choosing the transition sequence constitutes a different interdigitation of the timelines. 

Viewed differently, each model simultaneously satisfying the timelines induces a co-occurrence 
relation on tuples of timeline states, one from each timeline, identifying which tuples co-occur at 
some point in the model. We represent this concept formally as a set of tuples of co-occurring states, 
i.e., a co-occurrence relation. We sometimes think of this set of tuples as ordered by the sequence 
of transitions. Intuitively, the tuples in an interdigitation represent the maximal time intervals over 
which no MA timeline has a transition, with those tuples giving the co-occurring states for each 
such time interval. 

A relation i? on Xi x • • • x is simultaneously consistent with orderings <i,...,<n, if, 
whenever R{xi, . . . , Xn) and R{x'i, . . . , x'^), either Xi <i x[, for all i, or x[ <i Xi, for all i. We say 
R is piecewise total if the projection of R onto each component is total — ^i.e., every state in any Xi 
appears in R. 

Definition 1 (Interdigitation). An interdigitation I of a set {$i, .... <&„} of MA timelines is a co- 
occurrence relation over $i x • • • x (viewing timelines as sets of states^) that is piecewise total 

and simultaneously consistent with the state orderings of each We say that two states s G $i 
and s' G ^jfor i ^ j co-occur in I iff some tuple of I contains both s and s'. We sometimes refer to 
I as a sequence of tuples, meaning the sequence lexicographically ordered by the $i state orderings. 

We note that there are exponentially many interdigitations of even two MA timelines (relative to the 
total number of states in the timelines). Example 3 on page 396 shows an interdigitation of two MA 
timelines. Pseudo-code for non-deterministically generating an arbitrary interdigitation for a set of 
MA timelines can be found in Figure 5. Given an interdigitation I of the timelines si; S2; . . . ; Sm 
and ti;t2; ■ ■ ■ ;tn (and possibly others), the following basic properties of interdigitations are easily 
verifiable: 

1. For i < j, if Si and co-occur in I then for all k' < k, Sj does not co-occur with tki in /. 

2. and/(sm,i„). 

We first use interdigitations to syntactically characterize subsumption between MA timelines. 

Definition 2 (Witnessing Interdigitation). An interdigitation I of two MA timelines $1 and $2 
is a witness to $1 < $2 Wfaf every pair of co-occurring states si G $1 and S2 G $2. have that 
S2 is a subset of Si (i.e., si < S2). 

The following lemma and proposition establish the equivalence between witnessing interdigitations 
and MA subsumption. 

6. Recall, that, formally, MA timelines are viewed as sets of state-index pairs, rather than just sets of states. We ignore 
this distinction in our notation, for readability purposes, treating MA timelines as though no state is duplicated. 
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1 


an-interdigitation({$i, $2, ■ ■ ■ , ^n}) 


2 


// Input: MA timelines $1, . . . , 


3 


// Output: an interdigitation o/{$i, . . . , 


4 


Sq := {head($i), . . . , head($„)); 


5 


if for all 1 < i < n, \^i\ = 1 


6 


then return (So); 


7 


T' := such that > 1}; 


8 


T" := a-non-empty-subset-of(T'); 


9 


for i := 1 to n 


10 


if G T" 


12 


then := rest($j) 


12 


else <!>' := 


13 


return extend-tuple(S'o, an-interdigitation({$'^, . . . , $^})); 



Figure 5: Pseudo-code for an-interdigitation(), which non-deterministically computes an interdig- 
itation for a set {$1, . . . , of MA timelines. The function head($) returns the first 
state in the timeline rest($) returns $ with the first state removed, extend- tuple(a;,/) 
extends a tuple / by adding a new first element x to form a longer tuple, a-non-empty- 
subset-ofCS) non-deterministically returns an arbitrary non-empty subset of S. 



Lemma 1. For any MA timeline $ and any model M., if M. satisfies then there is a witnessing 
interdigitation for MAP {M) < 

Proposition 2. For MA timelines $1 and $2. ^1 < ^2 iff there is an interdigitation that witnesses 
$1 < $2- 

Proof: We show the backward direction by induction on the number of states n in timeline $1. If 
n = 1, then the existence of a witnessing interdigitation for $1 < $2 implies that every state in $2 
is a subset of the single state in $1, and thus that any model of $1 is a model of $2 so that $1 < $2- 
Now, suppose for induction that the backward direction of the theorem holds whenever $1 has n 
or fewer states. Given an arbitrary model of an n + 1 state $1 and an interdigitation W that 
witnesses $1 < $2> we must show that M is also a model of $2 to conclude $1 < $2 as desired. 

Write $1 as si; . . . ; svi+i and $2 as ti; . . . ; tm- As a witnessing interdigitation, W must identify 
some maximal prefix ti; . . . : t^' of ^2 rnade up of states that co-occur with si and thus that are 
subsets of si- Since Jvl = (M, [t, t']) satisfies $1, by definition there must exist a t" G [t, t'] such 
that (M, [t, t"]) satisfies si (and thus ti; . . . ; tm') and (M, /') satisfies 52; ... ; Sn+i for /' equal to 
either [t", t'] or [t" + 1, i']. In either case, it is straightforward to construct, from W, a witnessing 
interdigitation for S2; . . . ; Sn+i < tm'+i] ■ ■ - ^tm and use the induction hypothesis to then show that 
(M, I') must satisfy tm'+i', ■ ■ ', tm- It follows that 7W satisfies ^2 as desired. 

For the forward direction, assume that $1 < $2» and let M be any model such that $1 = 
MAP(A^). It is clear that such an M exists and satisfies $1. It follows that A4 satisfies $2- 
Lemma 1 then implies that there is a witnessing interdigitation for MAF{M) < $2 and thus for 

$1 < $2- O 
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3.4.3 Least-General Covering Formula 

A logic can discriminate two models if it contains a formula that satisfies one but not the other. It 
turns out that AMA formulas can discriminate two models exactly when much richer internal posi- 
tive event logic (IPEL) formulas can do so. Internal formulas are those that define event occurrence 
only in terms of properties within the defining interval. That is, satisfaction by (M, /) depends only 
on the proposition truth values given by M inside the interval /. Positive formulas are those that 
do not contain negation. Appendix A gives the full syntax and semantics of IPEL (which are used 
only to state and prove Lemma 3 ). The fact that AMA can discriminate models as well as IPEL 
indicates that our restriction to AMA formulas retains substantial expressive power and leads to 
the following result which serves as the least-general covering formula (LGCF) component of our 
specific-to-general learning procedure. Formally, an LGCF of model M within a formula language 
£ (e.g. AMA or IPEL) is a formula in C that covers J\4 such that no other covering formula in 
C is strictly less general. Intuitively, the LGCF of a model, if unique, is the "most representative" 
formula of that model. Our analysis uses the concept of model embedding. We say that model M 
embeds model M' iff MAP(A^) < MAP(A^'). 

Lemma 3. For any E G IPEL, if model M. embeds a model that satisfies E, then M. satisfies E. 

Proposition 4. The MA projection of a model is its LGCF for internal positive event logic (and 
hence for AMA), up to semantic equivalence. 

Proof: Consider model M.. We know that MAP(A^) covers M., so it remains to show that 
MAP(A^) is the least general formula to do so, up to semantic equivalence. 

Let E be any IPEL formula that covers M. Let M' be any model that is covered by MAP( A^) — 
we want to show that E also covers M.'. We know, from Lemma 1, that there is a witnessing 
interdigitation for MAP(>f') < MAP(>f). Thus, by Proposition 2, MAP(>f') < MAP(7Vf) 
showing that M' embeds M. Combining these facts with Lemma 3 it follows that E also covers 
M' and hence MAP(A^) <E. □ 

Proposition 4 tells us that, for IPEL, the LGCF of a model exists, is unique, and is an MA 
timeline. Given this property, when an AMA formula * covers all the MA timelines covered by 
another AMA formula we have < ^. Thus, for the remainder of this paper, when considering 
subsumption between formulas, we can abstract away from temporal models and deal instead with 
MA timelines. Proposition 4 also tells us that we can compute the LGCF of a model by constructing 
the MA projection of that model. Based on the definition of MA projection, it is straightforward to 
derive an LGCF algorithm which runs in time polynomial in the size of the model^. We note that 
the MA projection may contain repeated states. In practice, we remove repeated states, since this 
does not change the meaning of the resulting formula (as described in Example 1). 

3.4.4 Combining Interdigitation with Generalization or Specialization 

Interdigitations are useful in analyzing both conjunctions and disjunctions of MA timelines. When 

conjoining a set of timelines, any model of the conjunction induces an interdigitation of the timelines 
such that co-occurring states simultaneously hold in the model at some point (viewing states as 
sets, the the states resulting from unioning co-occurring states must hold). By constructing an 

7. We take the size of a model M = {M, I) to be the sum over a; 6 / of the number of true propositions in M{x). 
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interdigitation and taking the union of each tuple of co-occurring states to get a sequence of states, 
we get an MA timeline that forces the conjunction of the timelines to hold. We call such a sequence 
an interdigitation specialization of the timelines. Dually, an interdigitation generalization involving 
intersections of states gives an MA timeline that holds whenever the disjunction of a set of timelines 
holds. 

Definition 3. An interdigitation generalization ( specialization )ofa set S of MA timelines is an MA 
timeline si; . . . ; s^, such that, for some interdigitation lofT, with m tuples, Sj is the intersection 
(respectively, union) of the components ofthej'th tuple of the sequence I. The set of interdigitation 
generalizations (respectively, specializations) ofE is called IG{T,) (respectively, IS(E)J. 

Example 3. Suppose that si, S2, S3, ii, '^nd ^3 (^f^ each sets of propositions (i.e., states). Con- 
sider the timelines S = si; S2; ss and T = ti;t2;ts. The relation 

{ , (S2,il) , (S3,i2) , (S3,i3) } 

is an interdigitation of S and T in which states si and S2 co-occur with ti, and S3 co-occurs with 
t2 and ts. The corresponding IG and IS members are 

siHti; S2nti; 83^2; S3 n ^3 G IG({5,T}) 
siUh; S2Uti; sgUta; S3 U is G IS({S,T}). 

^fti '^si^tiC S2, t2 C S3, and ts C S3, then the interdigitation witnesses S < T. 

Each timeline in IG(S) (dually, IS(S)) subsumes (is subsumed by) each timeline in E — this is 
easily verified using Proposition 2. For our complexity analyses, we note that the number of states 
in any member of IG(S) or IS(S) is bounded from below by the number of states in any of the 
MA timelines in S and is bounded from above by the total number of states in all the MA timeUnes 
in S. The number of interdigitations of E, and thus of members of IG(E) or IS(E), is exponen- 
tial in that same total number of states. The algorithms that we present later for computing LGGs 
require the computation of both IG(E) and IS(E). Here we give pseudo-code to compute these 
quantities. Figure 6 gives pseudo-code for the function an-IG-member that non-deterministically 
computes an arbitrary member of IG(E) (an-IS -member is the same, except that we replace inter- 
section by union). Given a set E of MA timelines we can compute IG(E) by executing all possible 
deterministic computation paths of the function call an-IG-member(E), i.e., computing the set of 
results obtainable from the non-deterministic function for all possible decisions at non-deterministic 
choice points. 

We now give a useful lemma and a proposition concerning the relationships between conjunc- 
tions and disjunctions of MA concepts (the former being AMA concepts). For convenience here, 
we use disjunction on MA concepts, producing formulas outside of AMA with the obvious inter- 
pretation. 

Lemma 5. Given an MA formula $ that subsumes each member of a set E of MA formulas, $ also 
subsumes some member o/IG(E). Dually, when $ is subsumed by each member ofH, we have 
that $ is also subsumed by some member of IS(E). In each case, the length of^' is bounded by 
the size of E. 
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an-IG-member({$i, $2, ■ ■ ■ , ^n}) 




//Input: MA timelines $1, . . . , 




//Output: a member o/IG({$i, $2, • • • , ^n}) 




return map (intersect-tuple , an-interdigitation ({$1, . 


■ ,^4)); 



Figure 6: Pseudo-code for an-IG-member, which non-deterministically computes a member of 
IG(T) where T is a set of MA timehnes. The function intersect-tuple(/) takes a tuple / 
of sets as its argument and returns their intersection. The higher-order function map(/, I) 
takes a function / and a tuple / as arguments and returns a tuple of the same length as / 
obtained by applying / to each element of / and making a tuple of the results. 



Proposition 6. The following hold: 

1. (and-to-or) The conjunction of a set S of MA timelines equals the disjunction of the timelines 
mlS(S). 

2. (or-to-and) The disjunction of a set S of MA timelines is subsumed by the conjunction of the 
timelines in IG(S). 

Proof: To prove or-to-and, recall that, for any $ G S and any G IG(S), we have that $ < 
From this it is immediate that (V^) < (AIG(S)). Using a dual argument, we can show that 
(V IS(S)) < (A S). It remains to show that (A S) < (V which is equivalent to showing 

that any timeline subsumed by (/\ E) is also subsumed by (\/IS(S)) (by Proposition 4). Consider 
any MA timeline $ such that $ < (A — this implies that each member of S subsumes Lemma 
5 then implies that there is some G IS (E) such that $ < From this we get that $ < (V IS (S) ) 
as desired. □ 

Using and-to-or, we can now reduce AMA subsumption to MA subsumption, with an exponen- 
tial increase in the problem size. 

Proposition 7. For AMA *i and *2. *i < *2 if and only if for all $1 G IS(*i) and $2 G 

*2,^l < ^2- 

Proof: For the forward direction we show the contrapositive. Assume there is a $1 G IS(*i) and a 
$2 G ^2 such that $1 ^ $2- Thus, there is an MA timeline $ such that $ < $1 but $ ^ $2- This 
tells us that $ < (VIS(*i)) and that $ ^ ^2, thus (VIS(*i)) ^ ^2 and by "and-to-or" we get 
that *i ^ *2- 

For the backward direction assume that for all $1 G IS(\I/i) and $2 G *2 that $1 < $2- This 
tells us that for each $1 G IS(*i), that $1 < *2— thus, *i = (VIS(*i)) < *2. □ 

4. Subsumption and Generalization 

In this section we study subsumption and generalization of AMA formulas. First, we give a 
polynomial-time algorithm for deciding subsumption between MA formulas and then show that 
deciding subsumption for AMA formulas is coNP-complete. Second we give algorithms and com- 
plexity bounds for the construction of least-general generalization (LGG) formulas based on our 
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MA-subsumes ($i, $2) 

//Input: $1 = si] . . . ] Sjn and $2 = ^i; ■ ■ ■ ; ^r* 
//Output: $1 < $2 

1. if there is a path from 1 to „ in , $2) then return TRUE. For example, 

(a) Create an array Reachable(2,j) of boolean values, all FALSE, for < « < m and 
< i < n. 

(b) for « := 1 to m, Reachable('i, 0) := TRUE; 
for j := 1 to n. Reachable (0, j) := TRUE; 
for « := 1 to m 

for j := 1 to n 

Reachable := (ij C Sj A ( Reachable (< — V 

Reachable — 1) V 
Reachable (« — l,j — 1)); 

(c) if Reachable (m, n) then return TRUE; 

2. Otherwise, return FALSE; 



Figure 7: Pseudo-code for the MA subsumption algorithm, ^^(^i, $2) is the subsumption graph 
defined in the main text. 

analysis of subsumption, including existence, uniqueness, lower/upper bounds, and an algorithm for 
the LGG on AMA formulas. Third, we introduce a polynomial-time-computable syntactic notion 
of subsumption and an algorithm that computes the corresponding syntactic LGG that is exponen- 
tially faster than our semantic LGG algorithm. Fourth, in Section 4.4, we give a detailed example 
showing the steps performed by our LGG algorithms to compute the semantic and syntactic LGGs 
of two AMA formulas. 

4.1 Subsumption 

All our methods rely critically on a novel algorithm for deciding the subsumption question $1 < $2 
between MA formulas $1 and $2 in polynomial-time. We note that merely searching the possible 
interdigitations of $1 and $2 for a witnessing interdigitation provides an obvious decision procedure 
for the subsumption question — ^however, there are, in general, exponentially many such interdigi- 
tations. We reduce the MA subsumption problem to finding a path in a graph on pairs of states 
in $1 X $2» a polynomial-time operation. Pseudo-code for the resulting MA subsumption algo- 
rithm is shown in Figure 7. The main data structure used by the MA subsumption algorithm is the 
subsumption graph. 

Definition 4. The subsumption graph of two MA timelines $1 = si; • • • ; and $2 = ^i! ■ ■ ■ ; 
(written SG{^i, $2)) is a directed graph G = {V,E) with V = {vij |l<«<m, l<j< n}. 
The (directed) edge set E equals {{vij,Vii ji) \ Si < tj, Sii < tji , i < i' < i + 1, j < j' < j + 1}. 

To achieve a polynomial-time bound one can simply use any polynomial-time pathfinding algo- 
rithm. In our case the special structure of the subsumption graph can be exploited to determine if 
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the desired path exists in 0{mn) time, as the example method shown in the pseudo-code illustrates. 
The following theorem asserts the correctness of the algorithm assuming a correct polynomial-time 
path-finding method is used. 

Lemma 8. Given MA timelines $i = ,si; . . . ; s„, and $2 = ii! • • • ; in. there is a witnessing 
interdigitation for $1 < $2 iff there is a path in the subsumption graph SG{^i, ^2) from vi^i to 

Theorem 9. Given MA timelines $1 and $2, MA-subsumes( $1 , $2) decides $1 < $2 in polyno- 
mial time. 

Proof: The algorithm clearly runs in polynomial time. Lemma 8 tells us that line 2 of the algorithm 
will return TRUE iff there is a witnessing interdigitation. Combining this with Proposition 2 shows 
that the algorithm returns TRUE iff $1 < #2. □ 

Given this polynomial-time algorithm for MA subsumption, Proposition 7 immediately suggests 
an exponential-time algorithm for deciding AMA subsumption — by computing MA subsumption 
between the exponentially many IS timeUnes of one formula and the timelines of the other formula. 
Our next theorem suggests that we cannot do any better than this in the worst case — we argue that 
AMA subsumption is coNP-complete by reduction from boolean satisfiabihty. Readers uninterested 
in the technical details of this argument may skip directly to Section 4.2. 

To develop a correspondence between boolean satisfiabihty problems, which include negation, 
and AMA formulas, which lack negation, we imagine that each boolean variable has two AMA 
propositions, one for "true" and one for "false." In particular, given a boolean satisfiability problem 
over n variables pi, . . . ,Pn^ we take the set PROP„ to be the set containing 2n AMA propositions 
Trucfc and False^ for each k between 1 and n. We can now represent a truth assignment A to the pi 
variables with an AMA state sa given as follows: 

SA = {True,j | 1 < i < n, A{pi) = true} U {False,j | 1 < i < n, A{pi) = false} 

As Proposition 7 suggests, checking AMA subsumption critically involves the exponentially 
many interdigitation specializations of the timelines of one of the AMA formulas. In our proof, we 
design an AMA formula whose interdigitation speciaUzations can be seen to correspond to truth 
assignments^ to boolean variables, as shown in the following lemma. 

Lemma 10. Given some n, let * be the conjunction of the timelines 

n 

IJ { {PROPn ; Truci ■ False i ; PROPn ) , {PROPn ; Falsci ; Truci ; PROP^ ) } . 

2 = 1 

We have the following facts about truth assignments to the Boolean variables pi, . . . 

1. For any truth assignment A, PROPn; sa', PROPn is semantically equivalent to a member 
oflSin 

2. For each $ G IS(^) there is a truth assignment A such that $ < PROPn, sa\ PROPn- 
8. A truth assignment is a function mapping boolean variables to true or false. 
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With this lemma in hand, we can now tackle the complexity of AMA subsumption. 
Theorem 11. Deciding AMA subsumption is coNP-complete. 

Proof: We first show that deciding the AMA-subsumption of *i by *2 is in coNP by providing 
a polynomial-length certificate for any "no" answer. This certificate for non-subsumption is an 
interdigitation of the timelines of that yields a member of IS(\]/i) not subsumed by ^2- Such 
a certificate can be checked in polynomial time: given such an interdigitation, the corresponding 
member of IS(*i) can be computed in time polynomial in the size of and we can then test 
whether the resulting timeline is subsumed by each timeline in ^2 using the polynomial-time MA- 
subsumption algorithm. Proposition 7 guarantees that ^1 ^ ^2 iff there is a timeline in IS(^i) 
that is not subsumed by every timeline in \1'2, so that such a certificate will exist exactly when the 
answer to a subsumption query is "no." 

To show coNP-hardness we reduce the problem of deciding the satisfiability of a 3-SAT formula 
5" = Ci A • • • A Cm to the problem of recognizing non-subsumption between AMA formulas. Here, 
each Ci is {li^i V li^2 V k^s) and each lij either a proposition p chosen from P = {pi, . . . or 
its negation -ip. The idea of the reduction is to construct an AMA formula * for which we view 
the exponentially many members of IS(*) as representing truth assignments. We then construct an 
MA timeline $ that we view as representing -i5 and show that S is satisfiable iff * ^ 

Let ^ be as defined in Lemma 10. Let $ be the formula si; . . . ; s^, where 

= {Falscj I li^i^ = pj for some k} U 
{TruCj I /j = -ipj for some k}. 

Each Si can be thought of as asserting "not Ci." We start by showing that if S is satisfiable 
then * ^ Assume that S is satisfied via a truth assignment A — we know from Lemma 10 
that there is a G IS(^) that is semantically equivalent to PROP,, ; ,svi; PROP„. We show that 
PROP„; Syi; PROP„ is not subsumed by to conclude ^ ^ $ using Proposition 7, as desired. 
Suppose for contradiction that PROP„; s^;PROP,j is subsumed by $ — then the state sa must be 
subsumed by some state in Consider the corresponding clause Cj of S. Since A satisfies S 
we have that Ci is satisfied and at least one of its literals li^k must be true. Assume that li^^ = Pj 
dual argument holds for /j^ = -'Pj), then we have that Si contains Falscj while sa contains Truej 
but not FalsCj — thus, we have that sa ^ Sj (since Sj ^ sa), contradicting our choice of i. 

To complete the proof, we now assume that S is unsatisfiable and show that * < Using 
Proposition 7, we consider arbitrary $' in IS(^) — we will show that $' < From Lemma 10 we 
know there is some truth assignment A such that $' < PROP^j; sa', PR0P„. Since S is unsatisfiable 
we know that some Cj is not satisfied by A and hence -iCj is satisfied by A. This implies that 
each primitive proposition in Si is in sa- Let W be the following interdigitation between T = 
PR0P„; sa; PR0P„ and $ = si; . . . ; s„: 

{(PROP„, si) (PR0P„, S2) ■ ■ ■ (PR0P„, Si) {sa, Si) (PROP„, Si) (PROP„, s^+i) ■ ■ ■ (PROP„, 

We see that in each tuple of co-occurring states given above that the state from T is subsumed by 

the state from Thus 1^ is a witnessing interdigitation to PROP,, ; Sy\; PROP„ < which then 
holds by Proposition 2 — combining this with < PROP^; sa; PR0P„ we get that < □ 

Given this hardness result we later define a weaker polynomial-time-computable subsumption 
notion for use in our learning algorithms. 
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4.2 Least-General Generalization. 

An AMA LGG of a set of AMA formulas is an AMA formula that is more general than each 

formula in the set and not strictly more general than any other such formula. The existence of 
an AMA LGG is nontrivial as there can be infinite chains of increasingly specific formulas all of 
which generalize given formulas. Example 2 demonstrated such chains for an MA subsumee and 
can be extended for AMA subsumees. For example, each member of the chain P; Q, P; Q; P; Q, 
P; Q; P; Q;P;Q, ... covers *i = (P A Q) ; Q and *2 = P;{P/^Q)- Despite such complications, 
the AMA LGG does exist. 

Theorem 12. There is an LGG for any finite set S of AMA formulas that is subsumed by all other 
generalizations of S. 

Proof: Let V be the set U*'esIS(*')- Let * be the conjunction of all the MA timelines that 
generalize V while having size no larger than V. Since there are only a finite number of primitive 
propositions, there are only a finite number of such timelines, so ^ is well defined^. We show that 

is a least-general generalization of S. First, note that each timeline in ^ generalizes F and thus 
E (by Proposition 6), so * must generalize E. Now, consider arbitrary generalization of E. 
Proposition 7 implies that ^' must generalize each formula in F. Lemma 5 then implies that each 
timeline of ^' must subsume a timeline $ that is no longer than the size of F and that also subsumes 
the timelines of F. But then $ must be a timeline of ^, by our choice of so that every timeline of 

subsumes a timeline of It follows that subsumes and that * is an LGG of E subsumed 
by all other LGGs of E, as desired. □ 

Given that the AMA LGG exists and is unique we now show how to compute it. Our first step is to 
strengthen "or-to-and" from Proposition 6 to get an LGG for the MA sublanguage. 

Theorem 13. For a set S of MA formulas, the conjunction of all MA timelines in IG ( E ) is an AMA 
LGG of T,. 

Proof: Let * be the specified conjunction. Since each timeline of IG(S) subsumes all timelines 
in E, * subsumes each member of S. To show * is a least-general such formula, consider an 

AMA formula ^f' that also subsumes all members of S. Since each timeline of 'J'' must subsume all 
members of S, Lemma 5 imphes that each timehne of subsumes a member of IG(S) and thus 
each timeline of subsumes This implies ^ < □ 

We can now characterize the AMA LGG using IS and IG. 

Theorem 14. IG ( U * g s IS ( * )) w an AMA LGG of the set S of AMA formulas. 

Proof: Let S = . . . , *„} and £; = *i V ■ ■ ■ V We know that the AMA LGG of E 

must subsume E, or it would fail to subsume one of the Using "and-to-or" we can represent 
£■ as a disjunction of MA timelines given by £■ = (VIS(^i)) V • • • V (VIS(^„)). Any AMA 
LGG must be a least-general formula that subsumes E — i.e., an AMA LGG of the set of MA 
timelines U{IS(*)|'S' G E}. Theorem 13 tells us that an LGG of these timelines is given by 
IG(U{IS(*)|* G S}). □ 



9. There must be at least one such timeline, the timeline where the only state is true 
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1 


semantic-LCjCr({Wi, W2, ■ ■ ■ , Wm/) 


2 


//Input: AMA formulas . . . , 


3 


//Output: LGG . . . , *m} 






5 


for i := Horn 


6 


for each ^ in all-values(an-IS-member(*2)) 


7 


if (V$' G 5 . $ ^ 


8 


then S' := G 5 | < $}; 


9 


S:= (5-5')U{$}; 


10 


G := {}; 


11 


for each <t in all- values (an-IG-member( 5)) 


12 


if (V$' G G . ^ 


13 


then G' := G G | $ < $"}; 


14 


G:= (G-G')U{$}; 


15 


return (/\ G) 



Figure 8: Pseudo-code for computing the semantic AMA LGG of a set of AMA formulas. 



Theorem 14 leads directly to an algorithm for computing the AMA LGG — Figure 8 gives 
pseudo-code for the computation. Lines 4-9 of the pseudo-code correspond to the computation 
of |J{IS(^^')|^ G S}, where timelines are not included in the set if they are subsumed by timelines 
already in the set (which can be checked with the polynomial time MA subsumption algorithm). 
This pruning, accomplished by the if test in line 7, often drastically reduces the size of the time- 
line set for which we perform the subsequent IG computation — the final result is not affected by 
the pruning since the subsequent IG computation is a generalization step. The remainder of the 
pseudo-code corresponds to the computation of IG(U{IS(*)|* G S}) where we do not include 
timelines in the final result that subsume some other timeline in the set. This pruning step (the if test 
in line 12) is sound since when one timeline subsumes another, the conjunction of those timelines 
is equivalent to the most specific one. Section 4.4. 1 traces the computations of this algorithm for an 
example LGG calculation. 

Since the sizes of both IS(-) and IG(-) are exponential in the sizes of their inputs, the code in 
Figure 8 is doubly exponential in the input size. We conjecture that we cannot do better than this, 
but we have not yet proven a doubly exponential lower bound for the AMA case. When the input 
formulas are MA timelines the algorithm takes singly exponential time, since IS({$}) = $ when 
$ is in MA. We now prove an exponential lower bound when the input formulas are in MA. Again, 
readers uninterested in the technical details of this proof can safely skip forward to Section 4.3. 

For this argument, we take the available primitive propositions to be those in the set {pij \ 1 < 
i < n, l<j<n}, and consider the MA timelines 

^1 = 'Si,*; S2,*; • • • ; -Sn,* 
and $2 = s*,i;s*,2;---;s*,n, where 
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Si,* = Pi,l /\ ■ ■ ■ /\ Pi,n 

and s^j = pij A ■ ■ ■ Apn,j. 

We will show that any AMA LGG of $i and $2 must contain an exponential number of timelines. 
In particular, we will show that any AMA LGG is equivalent to the conjunction of a subset of 
IG({$i , $2}), and that certain timelines may not be omitted from such a subset. 

Lemma 15. Any AMA LGG ^ of a set S of MA timelines is equivalent to a conjunction ^' of 
timelines from 1G{T,) with |*'| < |*| 

Proof: Lemma 5 implies that any timeline $ in must subsume some timeline G IG(S). But 
then the conjunction of such must be equivalent to since it clearly covers E and is covered 
by the LGG ^. Since ^' was formed by taking one timeline from 1G(S) for each timeline in 
we have | '1/' | < | ^ | . □ We can complete our argument then by showing that exponentially many 

timelines in lG({<I>i, $2}) cannot be omitted from such a conjunction while it remains an LGG. 

Notice that for any i, j we have that Sj,*ns* j = pij. This implies that any state in IG({$i , $2}) 
contains exactly one proposition, since each such state is formed by intersecting a state from $1 and 
$2- Furthermore, the definition of interdigitation, apphed here, implies the following two facts for 
any timehne qi-q2; . . . ; in IG({$i, #2}): 

1- Qi =Pi,i and Qm = Pn,n- 

2. For consecutive states = Pij and qt+i = Pi'j', i' is either iori + 1, j' is either j or j + 1, 
and not both i = %' and j = j'. 

Together these facts imply that any timeline in IG({$i, $2}) is a sequence of propositions starting 
with pi 1 and ending with p„ „ such that any consecutive propositions Pij;pi'j' are different with 
i' equal to 2 or 2 + 1 and j' equal to j or j + L We call a timehne in IG({$i, $2}) square if 
and only if each pair of consecutive propositions p,ij and p^iji have either i' = i or j' = j. The 
following lemma imphes that no square timehne can be omitted from the conjunction of timelines 
in IG($i , $2) if it is to remain an LGG of $1 and $2- 

Lemma 16. Let $1 and $2 given above and let ^ = /\IG({<I>i, '^2})- For any ^' whose 
timelines are a subset of those in ^ that omits some square timeline, we have * < 

The number of square timelines in IG({$i, $2}) is equal to („l^x')i"(^,^^x)i and hence is exponen- 
tial in the size of $1 and $2- We have now completed the proof of the following result. 

Theorem 17. The smallest LGG of two MA formulas can be exponentially large. 

Proof: By Lemma 15, any AMA LGG ^' of $1 and $2 is equivalent to a conjunction of the same 
number of timehnes chosen from IG({$i, $2})- However, by Lemma 16, any such conjunction 
must have at least f^n'^i^\{n-i)\ timehnes, and then so must which must then be exponentially 
large. □ 

Conjecture 18. The smallest LGG of two AMA formulas can be doubly -exponentially large. 
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We now show that our lower-bound on AMA LGG complexity is not merely a consequence of 
the existence of large AMA LGGs. Even when there is a small LGG, it can be expensive to compute 
due to the difficulty of testing AMA subsumption: 

Theorem 19. Determining whether a formula ^ is an AMA LGG for two given AMA formulas 
and *2 is co-NP-hard, and is in co-NEXP, in the size of all three formulas together. 

Proof: To show co-NP-hardness we use a straightforward reduction from AMA subsumption. Given 

two AMA formulas and ^2 we decide ^1 < ^2 by asking whether *2 is an AMA LGG of *i 
and ^2- Clearly ^1 < ^^2 iff ^2 is an LGG of the two formulas. 

To show the co-NEXP upper bound, note that we can check in exponential time whether *i < * 
and *2 < * using Proposition 7 and the polynomial-time MA subsumption algorithm. It remains 
to show that we can check whether is not the "leasf ' subsumer. Since Theorem 14 shows that the 
LGG of ^1 and ^2 is IG(IS(*i) U IS(*2)), if * is not the LGG then ^ ^ IG(IS(*i) U 18(^2))- 
Thus, by Proposition 7, if ^ is not a least subsumer, there must be timelines $1 € IS(*) and 
$2 G IG(IS(*i) U IS(*2)) such that $1 ^ $2- We can then use exponentially long certificates 
for "No" answers: each certificate is a pair of an interdigitation Ii of ^ and an interdigitation I2 of 
IS(^i)UlS(^2), such that the corresponding members $1 G IS(^) and $2 G IG(IS(^i) UlS(^2)) 
have $1 ^ $2- Given the pair of certificates Ii and I2, $1 can be computed in polynomial time, 
$2 can be computed in exponential time, and the subsumption between them can be checked in 
polynomial time (relative to their size, which can be exponential). If * is the LGG then * < 
IG(IS(*i) U IS(*2)), so that no such certificates will exist. □ 

4.3 Syntactic Subsumption and Syntactic Least-General Generalization. 

Given the intractability results for semantic AMA subsumption, we now introduce a tractable gen- 
erahty notion, syntactic subsumption, and discuss the corresponding LGG problem. The use of 
syntactic forms of generality for efficiency is familiar in ILP (Muggleton & De Raedt, 1994) — 
where, for example, ^-subsumption is often used in place of the entailment generality relation. 
Unlike AMA semantic subsumption, syntactic subsumption requires checking only polynomially 
many MA subsumptions, each in polynomial time (via Theorem 9). 

Definition 5. AMA *i is syntactically subsumed by AMA ^2 (written ^1 ^2) iff for each MA 
timeline $2 G there is an MA timeline $1 G *i such that $1 < $2- 

Proposition 20. AMA syntactic subsumption can be decided in polynomial time. 

Syntactic subsumption trivially implies semantic subsumption — however, the converse does not 
hold in general. Consider the AMA formulas (^4; B) A (-B; A), and A; B; A where A and B are 
primitive propositions. We have {A; B) A {B; A) < A; B; A, however, we have neither A;B< 
A; B; A nor B;A < A; B; A, so that A; B; A does not syntactically subsume {A; B) A {B\ A). 
Syntactic subsumption fails to recognize constraints that are only derived from the interaction of 
timelines within a formula. 

Syntactic Least-General Generalization. A syntactic AMA LGG is a syntactically least-general 
AMA formula that syntactically subsumes the input AMA formulas. Here, "least" means that no 
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formula properly syntactically subsumed by a syntactic LGG can syntactically subsume the input 
formulas. Based on the hardness gap between syntactic and semantic AMA subsumption, one might 
conjecture that a similar gap exists between the syntactic and semantic LGG problems. Proving such 
a gap exists requires closing the gap between the lower and upper bounds on AMA LGG shown in 
Theorem 14 in favor of the upper bound, as suggested by Conjecture 18. While we cannot yet 
show a hardness gap between semantic and syntactic LGG, we do give a syntactic LGG algorithm 
that is exponentially more efficient than the best semantic LGG algorithm we have found (that of 
Theorem 14). First, we show that syntactic LGGs exist and are unique up to mutual syntactic 
subsumption (and hence up to semantic equivalence). 

Theorem 21. There exists a syntactic LGG for any AMA formula set S that is syntactically sub- 
sumed by all syntactic generalizations ofH. 

Proof: Let "^i be the conjunction of all the MA timelines that syntactically generalize E while 
having size no larger than S. As in the proof of Theorem 12, \1' is well defined. We show that 
* is a syntactic LGG for S. First, note that * syntactically generalizes S because each timehne 
of ^ generalizes a timeline in every member of S, by the choice of ^. Now consider an arbitrary 
syntactic generalization of S. By the definition of syntactic subsumption, each timeline $ in 

must subsume some timeline $q, in each member a of E. Lemma 5 then implies that there is a 
timeline of size no larger than S that subsumes all the $q, while being subsumed by By our 
choice of ^, the timeline <&' must be a timeline of ^. It follows then that ^' syntactically subsumes 

and that * is a syntactic LGG of E subsumed by all other syntactic generahzations of S. □ 

In general, we know that semantic and syntactic LGGs are different, though clearly the syntactic 
LGG is a semantic generahzation and so must subsume the semantic LGG. For example, (^4; B) A 
{B; A), and A; B; A have a semantic LGG of A; B; A, as discussed above; but their syntactic LGG 
is {A; B; true) A (true; B; A), which subsumes A; B; A but is not subsumed by A; B; A. Even 
so, for MA formulas: 

Proposition 22. For MA $ and AMA $ <syn * is equivalent to $ < 

Proof: The forward direction is immediate since we already know syntactic subsumption implies 
semantic subsumption. For the reverse direction, note that $ < * implies that each timeline of * 
subsumes $ — thus since $ is a single timehne each timeline in * subsumes "some timehne" in $ 
which is the definition of syntactic subsumption. □ 

Proposition 23. Any syntactic AMA LGG for an MA formula set E is also a semantic LGG for E. 

Proof: Now, consider a syntactic LGG \1/ for E. Proposition 22 implies that ^ is a semantic 
generahzation of E. Consider any semantic LGG of E. We show that * < ^' to conclude that * 
is a semantic LGG for E. Proposition 22 implies that syntactically subsumes E. It follows that 
A * syntactically subsumes E. But, A * is syntactically subsumed by which is a syntactic 
LGG of E — it follows that 'J'' A 'J' syntactically subsumes ^, or * would not be a least syntactic 
generalization of E. But then < A ^), which implies ^ < as desired. □ 

We note that the stronger result stating that a formula ^l/ is a syntactic LGG of a set E of MA formu- 
las if and only if it is a semantic LGG of E is not an immediate consequence of our results above. At 
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first examination, the strengthening appears trivial, given the equivalence of $ < ^ and $ <syn ^ 
for MA However, being semantically least is not necessarily a stronger condition than being syn- 
tactically least — we have not ruled out the possibihty that a semantically least generalization ^ may 
syntactically subsume another generalization that is semantically (but not syntactically) equivalent. 
(This question is open, as we have not found an example of this phenomenon either.) 

Proposition 23 together with Theorem 21 have the nice consequence for our learning approach 
that the syntactic LGG of two AMA formulas is a semantic LGG of those formulas, as long as the 
original formulas are themselves syntactic LGGs of sets of MA timelines. Because our learning ap- 
proach starts with training examples that are converted to MA timelines using the LGCF operation, 
the syntactic LGGs computed (whether combining all the training examples at once, or incremen- 
tally computing syntactic LGGs of parts of the training data) are always syntactic LGGs of sets of 
MA timelines and hence are also semantic LGGs, in spite of the fact that syntactic subsumption is 
weaker than semantic subsumption. We note, however, that the resulting semantic LGGs may be 
considerably larger than the smallest semantic LGG (which may not be a syntactic LGG at all). 

Using Proposition 23, we now show that we cannot hope for a polynomial-time syntactic LGG 
algorithm. 

Theorem 24. The smallest syntactic LGG of two MA formulas can be exponentially large. 

Proof: Suppose there is always a syntactic LGG of two MA formulas that is not exponentially large. 

Since by Proposition 23 each such formula is also a semantic LGG, there is always a semantic LGG 
of two MA formulas that is not exponentially large. This contradicts Theorem 17. □ 

While this is discouraging, we have an algorithm for the syntactic LGG whose time complexity 
matches this lower-bound, unlike the semantic LGG case, where the best algorithm we have is 
doubly exponential in the worst case. Theorem 14 yields an exponential time method for computing 
the semantic LGG of a set of MA timelines S — since for a timeline IS = we can simply 
conjoin all the timelines of IG(S). Given a set of AMA formulas, the syntactic LGG algorithm uses 
this method to compute the polynomially-many semantic LGGs of sets of timelines, one chosen 
from each input formula, and conjoins all the results. 

Theorem 25. The formula A$ ■ ■ ■ , ^n}) is a syntactic LGG of the AMA formulas 

Proof: Let \E' be A<i>,G*i ^^({^i' • • • ' '^"D- Each timeline $ of ^ must subsume each because 
$ is an output of IG on a set containing a timeline of *j — thus * syntactically subsumes each 
To show that * is a syntactically least such formula, consider a that syntactically subsumes every 
We show that * <syn to conclude. Each timehne in subsumes a timeUne Tj G ^j, 
for each i, by our assumption that <syn ^' ■ But then by Lemma 5, must subsume a member 
of IG({Ti, . . . , T„}) — and that member is a timeline of * — so each timehne of subsumes a 
timeline of We conclude * <syn as desired. □ 

This theorem yields an algorithm that computes a syntactic AMA LGG in exponential time — 
pseudo-code for this method is given in Figure 9. The exponential time bound follows from the fact 
that there are exponentially many ways to choose $i, . . . , $m in line 5, and for each of these there 
are exponentially many semantic-LGG members in line 6 (since the are all MA timelines) — the 
product of these two exponentials is still an exponential. 
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1 


syntactic-LCrCrdVi, \l/2, ■ ■ ■ , Wm j) 


2 


//Input: AMA formulas . . . , 


3 


//Output: syntactic LGG . . . , 


4 


G:={}; 


J 


for parh i <3) \ ilr , v . . . v ilr 


6 


for each <!> in semantic-LGG({$i, ...,$„}) 


7 


if (V$' G G . ^ 


8 


then G' := G G | $ < $"}; 


9 


G:= (G-G')U{$}; 


10 


return (/\ G) 



Figure 9: Pseudo-code that computes the syntactic AMA LGG of a set of AMA formulas. 



The formula returned by the algorithm shown is actually a subset of the syntactic LGG given 
by Theorem 25. This subset is syntactically (and hence semantically) equivalent to the formula 
specified by the theorem, but is possibly smaller due to the pruning achieved by the if statement in 
lines 7-9. A timeline is pruned from the set if it is (semantically) subsumed by any other timeline in 
the set (one timeline is kept from any semantically equivalent group of timelines, at random). This 
pruning of timelines is sound, since a timeline is pruned from the output only if it subsumes some 
other formula in the output — this fact allows an easy argument that the pruned formula is syntacti- 
cally equivalent to (i.e. mutually syntactically subsumed by) the unpruned formula. Section 4.4.2 
traces the computations of this algorithm for an example LGG calculation. We note that in our em- 
pirical evaluation discussed in Section 6, there was no cost in terms of accuracy for using the more 
efficient syntactic vs. semantic LGG. We know this because our learned definitions made errors in 
the direction of being overly specific — thus, since the semantic-LGG is at least as specific as the 
syntactic-LGG there would be no advantage to using the semantic algorithm. 

The method does an exponential amount of work even if the result is small (typically because 
many timelines can be pruned from the output because they subsume what remains). It is still an 
open question as to whether there is an output-efficient algorithm for computing the syntactic AMA 
LGG — this problem is in coNP and we conjecture that it is coNP-complete. One route to settling 
this question is to determine the output complexity of semantic LGG for MA input formulas. We 
believe that problem also to be coNP-complete, but have not proven this; if that problem is in P, 
there is an output-efficient method for computing syntactic AMA LGG based on Theorem 25. 

A summary of the algorithmic complexity results from this section can be found in Table 3 in 
the conclusions section of this paper. 

4.4 Examples: Least-General Generalization Calculations 

Below we work through the details of a semantic and a syntactic LGG calculation. We consider the 
AMA formulas ^ = {A\ B) A {B\ A) and $ = ^4; 5; A, for which the semantic LGG is A; B; A 
and the syntactic LGG is {A; B; true) A (true; B; A). 
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4.4.1 Semantic LGG Example 

The first step in calculating the semantic LGG, according to the algorithm given in Figure 8, is to 

compute the interdigitation-specializations of the input formulas (i.e., 1S($) and 1S(^)). Trivially, 
we have that 1S($) = ^ = A; B; A. To calculate IS(*), we must consider the possible interdigita- 
tions of for which there are three, 

{ {A,D),{D,D),{B,A) } 

{ {A,B),{B,A) } 

{ {A,B),{A,A),{B,A) } 

Each interdigitation leads to the corresponding member of IS (^) by unioning (conjoining) the states 
in each tuple, so IS(*) is 

{ {AAB);B;{AAB), 
{A A B), 

{AAB)-A- (AAB) }. 

Lines 5-9 of the semantic LGG algorithm compute the set S, which is equal to the union of the 
timelines in 1S(']/) and 1S($), with all subsumed timelines removed. For our formulas, we see that 
each timeline in IS(*) is subsumed by $ — thus, we have that S = ^ = A; B; A. 

After computing S, the algorithm returns the conjunction of timeUnes in 10(5*), with redundant 
timelines removed (i.e., all subsuming timelines are removed). In our case, IG(5) = A; B; A, 
trivially, as there is only one timehne in S, thus the algorithm correctly computes the semantic LGG 
of * and to be A; B; A. 

4.4.2 Syntactic LGG Example 

The syntactic LGG algorithm, shown in Figure 9, computes a series of semantic LGGs for MA 
timeline sets, returning the conjunction of the results (after pruning). Line 5 of the algorithm, cycles 
through timeline tuples from the cross-product of the input AMA formulas. In our case the tuples 
in $ X * are Ti = B; A, A; B) and T2 = {A; B; A, B; A) — for each tuple, the algorithm 
computes the semantic LGG of the tuple's timelines. 

The semantic LGG computation for each tuple uses the algorithm given in Figure 8, but the 
argument is always a set of MA timehnes rather than AMA formulas. For this reason, lines 4- 
9 are superfluous, as for an MA timehne IS($') = In the case of tuple Ti, hnes 4-9 
of the algorithm just compute S = {A;B;A, A;B}. It remains to compute the interdigitation- 
generalizations of S (i.e., IG(S')), returning the conjunction of those timelines after pruning (hnes 
10-15 in Figure 8). The set of all interdigitations of S are, 

{ {A,A).,{B,A).,{B.B).^{B,A) } 

{ {A,A),{B,B),{B,A) } 

{ {A,A),{A,B),{B,B),{B,A) } 

{ {A,A),{A,B),{B,A) } 

{ {A,A),{A,B),{A,A),{B,A) } 

By intersecting states in interdigitation tuples we get IG(5), 

{ ^; true; S; true, ^; true, A; true; 5; true, A; true; true, A; true; ^; true } 
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Since the timeline A; B; true is subsumed by all timelines in 10(5), all other timelines will be 
pruned. Thus the semantic LGG algorithm returns A; B; true as the semantic LGG of the timelines 
inTi. 

Next the syntactic LGG algorithm computes the semantic LGG of the timeUnes in T2. Following 
the same steps as for Ti, we find that the semantic LGG of the timehnes in T2 is true; B; A. Since 
A\ B\ true and true; _B; ^ do not subsume one another, the set G computed by lines 5-9 of the 
syntactic LGG algorithm is equal to { A;B; true, true; B;A}. Thus, the algorithm computes the 
syntactic LGG of $ and * to be (^4; B; true) A (true; B; A). Note that, in this case, the syntactic 
LGG is more general than the semantic LGG. 

5. Practical Extensions 

We have implemented a specific-to-general AMA learning algorithm based on the LGCF and syn- 
tactic LGG algorithms presented earlier. This implementation includes four practical extensions. 
The first extension aims at controlling the exponential complexity by limiting the length of the 
timelines we consider. Second we describe an often more efficient LGG algorithm based on a 
modified algorithm for computing pairwise LGGs. The third extension deals with applying our 
propositional algorithm to relational data, as is necessary for the application domain of visual event 
recognition. Fourth, we add negation into the AMA language and show how to compute the cor- 
responding LGCFs and LGGs using our algorithms for AMA (without negation). Adding negation 
into AMA turns out to be crucial to achieving good performance in our experiments. We end this 
section with a review of the overall complexity of our implemented system. 

5.1 k-AMA Least-General Generalization 

We have already indicated that our syntactic AMA LGG algorithm takes exponential time relative 
to the lengths of the timelines in the AMA input formulas. This motivates restricting the AMA 
language to A;-AMA in practice, where formulas contain timelines with no more than k states. 
As k is increased the algorithm is able to output increasingly specific formulas at the cost of an 
exponential increase in computational time. In the visual-event-recognition experiments shown 
later, as we increased k, the resulting formulas became overly specific before a computational bottle- 
neck is reached — i.e., for that application the best values of k were practically computable and the 
ability to limit k provided a useful language bias. 

We use a k-cover operator in order to limit our syntactic LGG algorithm to k-AMA. A k-cover 
of an AMA formula is a syntactically least general A;-AMA formula that syntactically subsumes 
the input — it is easy to show that a A; -cover for a formula can be formed by conjoining all A;-MA 
timelines that syntactically subsume the formula (i.e., that subsume any timeline in the formula) . 
Figure 10 gives pseudo-code for computing the A;-cover of an AMA formula. It can be shown that 
this algorithm correctly computes a A;-cover for any input AMA formula. The algorithm calculates 
the set of least general k-MA timelines that subsume each timeline in the input — the resulting k-MA 
formulas are conjoined and "redundant" timelines are pruned using a subsumption test. We note that 
the A;-cover of an AMA formula may itself be exponentially larger than that formula; however, in 
practice, we have found A;-covers not to exhibit undue size growth. 

Given the A;-cover algorithm we restrict our learner to A,- AMA as follows: 1) Compute the 
A;-cover for each AMA input formula. 2) Compute the syntactic AMA LGG of the resulting k- 
AMA formulas. 3) Return the A;-cover of the resulting AMA formula. The primary bottleneck of 
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1 


k-cover(A;, Ai<i<m ^i) 




2 


// Input: positive natural number k, AMA formula /\Ki<m 




3 


// tilt T i / • ^» i* y^T" /\ rfT^ . 

// yJUipui. hj-COV€r OJ /\i<i<y72 




4 


G := {}; 




5 


for i := 1 to m 




6 


for each P := (Pi. . . . , P„) in all-values(a-k-partition(A;, $j)) 




7 


(nPi);...;(nPn); 




8 


if (V$ G G . 




9 


Uien := |sP t Cr 1 VP <- sP |; 




1 n 


^ — (n — n'\ 1 1 /sl- 
ur . — yKjr \jr j <j ^Sfj, 




1 1 
1 1 


IciUIil \i\ \J ) 




12 


a-k-partition(A;, si; . . . ; Sj) 




13 


//Input: positive natural number k, MA timeline si; . . . ; Sj 




14 


//Output: a tuple of < k sets of consecutive states that partitions si, . . 




15 


if j < A; then return ({si}, . . . , {sj}); 




17 


if A; = 1 then return {{si, . . . , sj}); 




18 


I := a-member-of ({1, 2, . . . , j — A; + 1}); // pick next block size 




19 


Po = {si, . . . , s^}; // construct next block 




20 


return extend-tuple(Po, a-k-partition(A; — 1, s^+i; . . . ; Sj)); 





Figure 10: Pseudo-code for non-deterministically computing a k-cover of an AMA formula, along 
witli a non-deterministic helper function for selecting a < A; block partition of the states 
of a timeline. 



the original syntactic LGG algorithm is computing the exponentially large set of interdigitation- 
generalizations — the A-limited algorithm limits this complexity as it only computes interdigitation- 
generalizations involving A-MA timelines. 

5.2 Incremental Pairwise LGG Computation 

Our implemented learner computes the syntactic k-AMA LGG of AMA formula sets — ^however, 
it does not directly use the algorithm describe above. Rather than compute the LGG of formula 

sets via a single call to the above algorithm, it is typically more efficient to break the computation 
into a sequence of pairwise LGG calculations. Below we describe this approach and the potential 
efficiency gains. 

It is straightforward to show that for both syntactic and semantic subsumption we have that 
LGG(*i, . . . , = LGG(^i, LGG(^2, ■ ■ ■ , *m)) where the *i are AMA formulas. Thus, by 
recursively applying this transformation we can incrementally compute the LGG of m AMA for- 
mulas via a sequence of m — 1 pairwise LGG calculations. Note that since the LGG operator is 
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commutative and associative the final result does not depend on the order in which we process the 
formulas. We will refer to this incremental pairwise LGG strategy as the incremental approach and 
to the strategy that makes a single call to the k-AMA LGG algorithm (passing in the entire formula 
set) as the direct approach. 

To simplify the discussion we will consider computing the LGG of an MA formula set S — ^the 
argument can be extended easily to AMA formulas (and hence to k-AMA). Recall that the syntactic 
LGG algorithm of Figure 9 computes LGG(S) by conjoining timelines in IG(S) that do not sub- 
sume any of the others, eliminating subsuming timelines in a form of pruning. The incremental 
approach appUes this pruning step after each pair of input formulas is processed — in contrast, the 
direct approach must compute the interdigitation-generalization of all the input formulas before any 
pruning can happen. The resulting savings can be substantial, and typically more than compensates 
for the extra effort spent checking for pruning (i.e. testing subsumption between timelines as the 
incremental LGG is computed). A formal approach to describing these savings can be constructed 
based on tiie observation that both U$6lG({$, ,$2}) U S) and U$eLGG($i,$2) ^^^^^^^ ^ ^) 

can be seen to compute the LGG of E U {$1, $2}. but with the latter being possibly much cheaper 
to compute due to pruning. That is, LGG($i, $2) typically contains a much smaller number of 
timelines than IG({$i, $2})- 

Based on the above observations our implemented system uses the incremental approach to 
compute the LGG of a formula set. We now describe an optimization used in our system to speedup 
the computation of pairwise LGGs, compared to directly running the algorithm in Figure 9. Given a 
pair of AMA formulas *i = $11 A • • • A and = ^2,1 A ■ ■ ■ A $2,n> let * be their syntactic 
LGG obtained by running the algorithm in Figure 9. The algorithm constructs * by computing 
LGGs of all MA timeline pairs (i.e., LGG($i ,j, $2,j) for all i and j) and conjoining the results 
while removing subsuming timelines. It turns out that we can often avoid computing many of these 
MA LGGs. To see this consider the case when there exists i and j such that $1 j < ^2,j^ we know 
LGG($i 2, ^2,j) = ^2,i which tells us that that $2,j will be considered for inclusion into * (it may 
be pruned). Furthermore we know that any other LGG involving $2.j will subsume $2.j and thus 
will be pruned from This shows that we need not compute any MA LGGs involving ^2,j, rather 
we need only to consider adding $2j when constructing 

The above observation leads to a modified algorithm (used in our system) for computing the 
syntactic LGG of a pair of AMA formulas. The new algorithm only computes LGGs between 
non-subsuming timelines. Given AMA formulas ^1 and ^2» the modified algorithm proceeds as 
follows: 1) Compute the subsumer set 5 = G *i | 3$' G ^2 s.t. < U G *2 | 3$' G 
*i s.t. < $}. 2) Let AMA ^[ (^2) be the result of removing timeUnes from *i (^2) that are 
in S. 3) Let ^' be the syntactic LGG of and ^2 computed by running the algorithm in Figure 9 
(if either ^f'- is empty then will be empty). 4) Let S' be the conjunction of timelines in S that do 
not subsume any timeline in 5) Return = A S'. This method avoids computing MA LGGs 
involving subsuming timelines (an exponential operation) at the cost of performing polynomially 
many MA subsumption tests (a polynomial operation). We have noticed a significant advantage to 
using this procedure in our experiments. In particular, the advantage tends to grow as we process 
more training examples. This is due to the fact that as we incrementally process training examples 
the resulting formulas become more general — ^thus, these more general formulas are likely to have 
more subsuming timelines. In the best case when ^1 <syn ^2 all timelines in ^2 are subsum- 
ing), we see that step 2 produces an empty formula and thus step 3 (the expensive step) performs no 
work — in this case we return the set 5 = \If'2 as desired. 
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5.3 Relational Data 

Leonard produces relational models that involve objects and (force dynamic) relations between 
those objects. Thus event definitions include variables to allow generalization over objects. For 
example, a definition for PlCKUp(rE, y, z) recognizes both PlCKUp(hand, block, table) as well as 
PlCKUp(man, box, floor). Despite the fact that our A;-AMA learning algorithm is propositional, we 
are still able to use it to learn relational definitions. 

We take a straightforward object-correspondence approach to relational learning. We view the 
models output by LEONARD as containing relations applied to constants. Since we (currently) 
support only supervised learning, we have a set of distinct training examples for each event type. 
There is an implicit correspondence between the objects filling the same role across the differ- 
ent training models for a given type. For example, models showing PiCKUp(hand, block, table) 
and PiCKUp(man, box, floor) have implicit correspondences given by (hand, man), (block, box), 
and (table, floor). We outline two relational learning methods that differ in how much object- 
correspondence information they require as part of the training data. 

5.3.1 Complete Object Correspondence 

This first approach assumes that a complete object correspondence is given, as input, along with 
the training examples. Given such information, we can propositionalize the training models by 
replacing corresponding objects with unique constants. The propositionalized models are then given 
to our propositional A,-AMA learning algorithm which returns a propositional A;-AMA formula. We 
then lift this propositional formula by replacing each constant with a distinct variable. Lavrac et al. 
(1991) has taken a similar approach. 

5.3.2 Partial Object Correspondence 

The above approach assumes complete object-correspondence information. While it is sometimes 

possible to provide all correspondences (for example, by color-coding objects that fill identical 
roles when recording training movies), such information is not always available. When only a 
partial object correspondence (or even none at all) is available, we can automatically complete the 
correspondence and apply the above technique. 

For the moment, assume that we have an evaluation function that takes two relational models 
and a candidate object correspondence, as input, and yields an evaluation of correspondence qual- 
ity. Given a set of training examples with missing object correspondences, we perform a greedy 
search for the best set of object-correspondence completions over the models. Our method works 
by storing a set P of propositionalized training examples (initially empty) and a set U of unproposi- 
tionalized training examples (initially the entire training set). For the first step, when P is empty, we 
evaluate all pairs of examples from U, under all possible correspondences, select the pair that yields 
the highest score, remove the examples involved in that pair from U, propositionalize them accord- 
ing to the best correspondence, and add them to P. For each subsequent step, we use the previously 
computed values of all pairs of examples, one from U and one from P, under all possible corre- 
spondences. We then select the example from U and correspondence that yields the highest average 
score relative to all models in P — this example is removed from U, propositionalized according to 
the winning correspondence, and added to P. For a fixed number of objects, the effort expended 
here is polynomial in the size of the training set; however, if the number of objects b that appear in a 
training example is allowed to grow, the number of correspondences that must be considered grows 
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as h^. For this reason, it is important that the events involved manipulate only a modest number of 
objects. 

Our evaluation function is based on the intuition that object roles for visual events (as well as 
events from other domains) can often be inferred by considering the changes between the initial 
and final moments of an event. Specifically, given two models and an object correspondence, we 
first propositionalize the models according to the correspondence. Next, we compute ADD and 
DELETE lists for each model. The ADD list is the set of propositions that are true at the final 
moment but not the initial moment. The DELETE list is the set of propositions that are true at the 
initial moment but not the final moment. These add and delete fists are motivated by STRIPS action 
representations (Fikes & Nilsson, 1971). Given such ADDj and DELETEj lists for models 1 and 2, 
the evaluation function returns the sum of the cardinahties of ADDi fl ADD2 and DELETEi fl 
DELETE2. This heuristic measures the similarity between the ADD and DELETE lists of the two 
models. The intuition behind this heuristic is similar to the intuition behind the STRIPS action- 
description language — i.e., that most of the differences between the initial and final moments of an 
event occurrence are related to the target event, and that event effects can be described by ADD and 
DELETE lists. We have found that this evaluation function works well in the visual-event domain. 

Note, that when full object correspondences are given to the learner (rather than automatically 
extracted by the learner), the training examples are interpreted as specifying that the target event 
took place as well as which objects filled the various event roles (e.g., PlCKUP(a,b,c)). Rather, 
when no object correspondences are provided the training examples are interpreted as specifying the 
existence of a target event occurrence but do not specify which objects fill the roles (i.e., the training 
example is labeled by PickUp rather than PlCKUp(a,b,c)). Accordingly, the rules learned when no 
correspondences are provided only allow us to infer that a target event occurred and not which 
objects filled the event roles. For example when object correspondences are manually provided the 
learner might produce the rule, 

(Supports (^;,y) A Contacts (2;,?/)); 
(Supports (a;, y) A ATTACHED(a;, y)) 

whereas a learner that automatically extracts the correspondences would instead produce the rule, 

(Supports (2;, y) A Contacts (2;,?/)); 
(Supports (s, y) a Attached (s, y)) 

Its worth noting, however, that upon producing the second rule the availability of a single training 
example with correspondence information allows the learner to determine the roles of the variables, 
upon which it can output the first rule. Thus, under the assumption that the learner can reliably 
extract object correspondences, we need not label all training examples with correspondence infor- 
mation in order to obtain definitions that explicitly recognize object roles. 

5.4 Negative Information 

The AMA language does not allow negated propositions. Negation, however, is sometimes neces- 
sary to adequately define an event type. In this section, we consider the language AMA~ , which is a 
superset of AMA, with the addition of negated propositions. We first give the syntax and semantics 
of AMA~, and extend AMA syntactic subsumption to AMA~. Next, we describe our approach to 



PlCKUp(a;, y, 2) = 



PickUp = 
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learning AMA~ formulas using the above-presented algorithms for AMA. We show that our ap- 
proach correctly computes the AMA~ LGCF and the syntactic AMA~ LGG. Finally, we discuss 
an alternative, related approach to adding negation designed to reduce the overfitting that appears to 
result from the full consideration of negated propositions. 

AMA has the same syntax as AMA, only with a new grammar for building states with negated 
propositions: 

literal ::= true \ prop \ ^Oprop 
state ::= literal \ literal A state 

where prop is any primitive proposition. The semantics of AMA~ are the same as for AMA except 
for state satisfaction. 

• A positive literal P {negative literal -lOP) is satisfied by model (M, /) iff M[x] assigns P 
true (false), for every x & l}^ 

• A state /i A • • • A is satisfied by model (M, /) iff each literal li is satisfied by {M,I). 

Subsumption. An important difference between AMA and AMA~ is that Proposition 2, estab- 
lishing the existence of witnessing interdigitations to MA subsumption, is no longer true for MA^. 
In other words, if we have two timelines $2 G AMA~, such that $1 < there need not be an 
interdigitation that witnesses $1 < $2- To see this, consider the AMA timeUnes: 

$1 = (a A 6 A c): 6; a; 6; (a A 6 A -■ o c) 
$2 = b; a; c; a; b;a;^o c; a; b 

We can then argue: 

1. There is no interdigitation that witnesses $1 < $2- To see this, first show that, in any such 

witness, the second and fourth states of $1 (each just "6") must interdigitate to align with 
either the first and fifth, or the fifth and ninth states of <I'2 (also, each just "6"). But in either 
of these cases, the third state of $1 will interdigitate with states of $2 that do not subsume it. 

2. Even so, we still have that $1 < $2- To see this, consider any model (M, I) that satisfies $1. 
There must be an interval 22] within / such that (M, «2]) satisfies the third state of $1, 
that is the state "a." We have two cases: 

(a) The proposition c is ttue at some point in (M, [ii, 22])- Then, one can verify that (M, /) 
satisfies both $1 and $2 in the following alignment: 

$1 = (a A 6 A c) ; 6; a; 6; (a A 6 A -■ o c) 

$2 = i>\ a; c; a; b\ a; o c; a; b 

10. We note that it is important that we use the notation -lOP rather than just -iP. In event-logic, the formula -iP 
is satisfied by a model whenever P is false as some instant in the model. Rather, event-logic interprets -i<>P as 
indicating that P is never true in the model (as defined above). Notice that the first form of negation does not yield a 
liquid property — i.e., -iP can be true along an interval but not necessarily during all subintervals. The second form of 
negation, however, does yield a liquid property provided that P is liquid. This is important to our learning algorithms, 
since they all assume states are built from liquid properties. 
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(b) The proposition c is false everywhere in (M, «2])- Then, one can verify that (M, /) 
satisfies both $i and $2 in the following alignment: 

$1 = (aAbAc); b; a; 6; (a A 6 A -■ o c) 

$2 = b; a; c; a; b; a; o c; a; b 

It follows that $1 < $2- 

In light of such examples, we conjecture that it is computationally hard to compute AMA^ 
subsumption even between timehnes. For this reason, we extend our definition of syntactic sub- 
sumption to AMA in a way that provides a clearly tractable subsumption test analogous to that 
discussed above for AMA. 

Definition 6. AMA~ *i is syntactically subsumed by AMA~ *2 (written *i <syn ^2) iff far 
each timeline $2 G there is a timeline $1 € *i such that there is a witnessing interdigitation 
for $1 < $2- 

The difference between the definition here and the previous one for AMA is that here we only need 
to test for witnessing interdigitations between timelines rather than subsumption between timelines. 
For AMA formulas, we note that the new and old definition are equivalent (due to Proposition 2); 
however, for AMA~ the new definition is weaker, and will result in more general LGG formulas. As 
one might expect, AMA syntactic subsumption impUes semantic subsumption and can be tested 
in polynomial-time using the subsumption graph described in Lemma 8 to test for witnesses. 

Learning. Rather than design new LGCF and LGG algorithms to directly handle AMA , we 

instead compute these functions indirectly by applying our algorithms for AMA to a transformed 
problem. Intuitively, we do this by adding new propositions to our models (i.e., the training exam- 
ples) that represent the proposition negations. Assume that the training-example models are over the 
set of propositions P = {pi, . . . We introduce a new set P = {pi, . . . of propositions 
and use these to construct new training models over P U P by assigning true to pi at a time in a 
model iff pi is false in the model at that time. After forming the new set of training models (each 
with twice as many propositions as the original models) we compute the least general AMA formula 
that covers the new models (by computing the AMA LGCFs and applying the syntactic AMA LGG 
algorithm), resulting in an AMA formula ^ over the propositions PUP. Finally we replace each pi 
in \I/ with ->Opi resulting in an AMA~ formula ^' over propositions in P — it turns out that under 
syntactic subsumption ^' is the the least general AMA^ formula that covers the original training 
models. 

We now show the correctness of the above transformational approach to computing the AMA^ 
LGCF and syntactic LGG. First, we introduce some notation. Let M be the set of all models over 
P. Let M be the set of models over P Li P, such that at any time, for each i, exactly one of pi 
and Pi is true. Let T be the following mapping from M to M: for (M, /) G M, T[{M, I)] is the 
unique {M',I) G M such that for all j £ I and all i, M'{j) assigns pi true iff M{j) assigns pi 
true. Notice that the inverse of T is a functional mapping from M-toM-. Our approach to handling 
negation using purely AMA algorithms begins by applying T to the original training models. In 
what follows, we consider AMA formulas over the propositions in P, and AMA formulas over 
the propositions in P U P. 

Let P be a mapping from AMA^ to AMA where for ^ E AMA^, F[^] is an AMA formula 
identical to ^ except that each -^Opi in ^ is replaced with pi. Notice that the inverse of P is a func- 
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tion from AMA to AMA~ and corresponds to the final step in our approach described above. The 
following lemma shows that there is a one-to-one correspondence between satisfaction of AMA~ 
formulas by models in M and satisfaction of AMA formulas by models in M. 

Lemma 26. For any model (M, I) & M and any * G AMA^, * covers (M, /) iff F[^] covers 
T[{M,I)]. 

Using this lemma, it is straightforward to show that our transformational approach computes the 
AMA LGCF under semantic subsumption (and hence under syntactic subsumption). 

Proposition 27. For any (M, /) G M, let $ be the AMA LGCF of the model T[(M, /)]. Then, 
F~^[<^] is the unique AMA~ LGCF of {M, I), up to equivalence. 

Proof: We know that $ covers T[(M, /)], therefore by Lemma 26 we know that covers 
(M, /). We now show that is the least-general formula in AMA^ that covers (M, I). For 

the sake of contradiction assume that some <&' G AMA~ covers {M,I) but that $' < It 
follows that there is some model (M', I') that is covered by _F^^[$] but not by <!>'. By Lemma 26 
we have that F[^'] covers T[{M, I)] and since $ is the unique AMA LGCF of T[{M, /)], up to 
equivalence, we have that $ < However, we also have that T[{M', I')] is covered by $ 

but not by which gives a contradiction. Thus, no such can exist. It follows that $ is an 

AMA^ LGCF. The uniqueness of the AMA^ LGCF up to equivalence follows because AMA^ is 
closed under conjunction; so that if there were any two non-equivalent LGCF formulas, they could 
be conjoined to get an LGCF formula strictly less than one of them. □ 

Below we use the fact that the F operator preserves syntactic subsumption. In particular, given 
two MA~ timehnes it is clear that any witnessing interdigitation of $i < $2 can be trivially 

converted into a witness for < ^[$2] (and vice versa). Since syntactic subsumption is defined 

in terms of witnessing interdigitations, it follows that for any ^1, ^'2 G AMA~, {^'i <syn ^2) iff 
<sy„ F[*2])- Using this property, it is straightforward to show how to compute the syntactic 
AMA~ LGG using the syntactic AMA LGG algorithm. 

Proposition 28. For any AMA~ formulas ^1, . . . , let * be the syntactic AMA LGG of 
{F[^i], . . . , Then, F~'^[^] is the unique syntactic AMA~ LGG of . . . , ^^1- 

Proof: We know that for each i, <syn ^ — thus, since F^^ preserves syntactic subsumption, 

we have that for each i, *j <syii [^]. This shows that F^^ [*] is a generaUzation of the inputs. 
We now show that is the least such formula. For the sake of contradiction assume that 

F^\^] is not least. It follows that there must be a ^' G AMA" such that ^' <syn F^^[^] and for 
each i, <syn ^' ■ Combining this with the fact that F preserves syntactic subsumption, we get 
that <syn * and for each i, F[^i] < But this contradicts the fact that * is an LGG; 

so we must have that is a syntactic AMA~ LGG. As argued elsewhere, the uniqueness of 

this LGG follows from the fact that AMA is closed under conjunction. □ 

These propositions ensure the correctness of our transformational approach to computing the 
syntactic LGG within AMA~. For the case of semantic subsumption, the transformational approach 
does not correctly compute the AMA" LGG. To see this, recall that above we have given two time- 
lines $1, $2 G AMA~, such that $1 < $2^ but there is no witnessing interdigitation. Clearly under 
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semantic subsumption, the AMA~ LGG of $i and $2 is $2- However, the semantic AMALGG of 
and ^[#2] is not F[<I'2]- The reason for this is that since there is no witness to -F[$i] < -F[<I'2] 
(and the are MA timelines), we know by Proposition 2 that ^ ^[$2]- Thus, F[^2] 

cannot be returned as the AMA LGG, since it does not subsume both input formulas — ^this shows 
that the transformational approach will not return $2 = ^^[^[^2]]- Here, the transformational 
approach will produce an AMA~ formula that is more general than $2- 

On the computational side, we note that, since the transformational approach doubles the num- 
ber of propositions in the training data, algorithms specifically designed for AMA~ may be more 
efficient. Such algorithms might leverage the special structure of the transformed examples that our 
AMA algorithms ignore — ^in particular, that exactly one of pi or pi is true at any time. 

Boundary Negation. In our experiments, we actually compare two methods for assigning truth 
values to the pi propositions in the training data models. The first method, called full negation, 
assigns truth values as described above, yielding the syntactically least-general AMA~ formula that 
covers the examples. We found, however, that using full negation often results in learning overly 
specific formulas. To help alleviate this problem, our second method places a bias on the use of 
negation. Our choice of bias is inspired by the idea that, often, much of the useful information for 
characterizing an event type is in its pre- and post-conditions. The second method, called boundary 
negation, differs from full negation in that it only allows pi to be true in the initial and final moments 
of a model (and then only if pi is false), pi must be false at all other times. That is, we only allow 
"informative" negative information at the beginnings and ends of the training examples. We have 
found that boundary negation provides a good trade-off between no negation (i.e., AMA), which 
often produces overly general results, and full negation (i.e., AMA~), which often produces overly 
specific and much more compUcated results. 

5.5 Overall Complexity and Scalability 

We now review the overall complexity of our visual event learning component and discuss some 
scalability issues. Given a training set of temporal models (i.e., a set of movies), our system does the 
following: 1) Propositionalize the training models, translating negation as descried in Section 5.4. 
2) Compute the LGCF of each propositional model. 3) Compute the A;-AMA LGG of the LGCFs. 
4) Return a lifted (variablized) version of the LGG. Steps two and four require Uttle computational 
overhead, being linear in the sizes of the input and output respectively. Steps one and three are 
the computational bottlenecks of the system — they encompass the inherent exponential complexity 
arising from the relational and temporal problem structure. 

Step One. Recall from Section 5.3.2 that our system allows the user to annotate training exam- 
ples with object correspondence information. Our technique for propositionalizing the models was 
shown to be exponential in the number of unannotated objects in a training example. Thus, our 
system requires that the number of objects be relatively small or that correspondence information 
be given for all but a small number of objects. Often the event class definitions we are interested 
in do not involve a large number of objects. When this is true, in a controlled learning setting we 
can manage the relational complexity by generating training examples with only a small number (or 
zero) irrelevant objects. This is the case for all of the domains studied empirically in this paper. 

In a less controlled setting, the number of unannotated objects may prohibit the use of our 
correspondence technique — there are at least three ways one might proceed. First, we can try to 
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develop efficient domain-specific techniques for filtering objects and finding correspondences. That 
is, for a particular problem it may be possible to construct a simple filter that removes irrelevant 
objects from consideration and then to find correspondences for any remaining objects. Second, we 
can provide the learning algorithm with a set of hand-coded first-order formulas, defining a set of 
domain-specific features (e.g., in the spirit of Roth & Yih, 2001). These features can then be used 
to propositionalize the training instances. Third, we can draw upon ideas from relational learning to 
design a "truly first-order" version of the A,-AMA learning algorithm. For example, one could use 
existing first-order generalization algorithms to generaUze relational state descriptions. Effectively 
this approach pushes the object correspondence problem into the A;-AMA learning algorithm rather 
than treating it as a preprocessing step. Since it is well known that computing first-order LGGs can 
be intractable (Plotkin, 1971), practical generahzation algorithms retain tractability by constraining 
the LGGs in various ways (e.g., Muggleton & Feng, 1992; Morales, 1997). 

Step Three. Our system uses the ideas of Section 5.2 to speedup the A;-AMA LGG computation 
for a set of training data. Nevertheless, the computational complexity is still exponential in k — ^thus, 
in practice we are restricted to using relatively small values of k. While this restriction did not Umit 
performance in our visual event experiments, we expect that it will limit the direct applicability 
of our system to more complex problems. In particular, many event types of interest may not 
be adequately represented via A;-AMA when k is small. Such event types, however, often contain 
significant hierarchical structure — ^i.e., they can be decomposed into a set of "short" sub-event types. 
An interesting research direction is to consider using our A;-AMA learner as a component of a 
hierarchical learning system — there it could be used to learn k-AMA sub-event types. We note 
that our learner alone cannot be appUed hierarchically because it requires hquid primitive events, 
but learns non-Uquid composite event types. Further work is required (and intended) to construct a 
hierarchical learner based perhaps on non-liquid AM A learning. 

Finally, recall that to compute the LGG of m examples, our system uses a sequence of m — 1 
pairwise LGG calculations. For a fixed k, each pairwise calculation takes polynomial time. How- 
ever, since the size of a pairwise LGG can grow by at least a constant factor with respect to the 
inputs, the worst-case time complexity of computing the sequence of m — 1 pairwise LGGs is expo- 
nential in m. We expect that this worst case will primarily occur when the target event type does not 
have a compact k-AMA representation — in which case a hierarchical approach as described above 
is more appropriate. When there is a compact representation, our empirical experience indicates 
that such growth does not occur — ^in particular, each pairwise LGG tends to yield significant prun- 
ing. For such problems, reasonable assumptions about the amount of pruning'^ imply that the time 
complexity of computing the sequence of m — 1 pairwise LGGs is polynomial in m. 

6. Experiments 
6.1 Data Set 

Our data set contains examples of 7 different event types: pick up, put down, stack, unstuck, move, 
assemble, and disassemble. Each of these involve a hand and two to three blocks. For a detailed 
description and sample video sequences of these event types, see Siskind (2001). Key frames from 
sample video sequences of these event types are shown in Figure 11. The results of segmentation, 

11 . In particular, assume that the size of a pairwise A;-AMA LGG is "usually" bounded by the sizes of the A;-covers of the 
inputs. 
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tracking, and model reconstruction are overlaid on the video frames. We recorded 30 movies for 
each of the 7 event classes resulting in a total of 210 movies comprising 11946 frames.'^ We 
replaced one assemble movie (assemble-left-qobi-04), with a duphcate copy of another (assemble- 
left-qobi-11) because of segmentation and tracking errors. 

Some of the event classes are hierarchical in that occurrences of events in one class con- 
tain occurrences of events in one or more simpler classes. For example, a movie depicting a 
M0VE(a, 6, c, d) event (i.e. a moves b from c to d) contains subintervals where PlCKUp(a, 6, c) 
and PUTDoWN(a, b, d) events occur. In our experiments, when learning the definition of an event 
class only the movies for that event class are used in training. We do not train on movies for other 
event classes that may also depict an occurrence of the event class being learned as a subevent. 
However, in evaluating the learned definitions, we wish to detect both the events that correspond to 
an entire movie as well as subevents that correspond to portions of that movie. For example, given a 
movie depicting a MoVE(a, b, c, d) event, we wish to detect not only the MoVE(a, b, c, d) event but 
also the PlCKUp(a, 6, c) and PUTD0WN(a, 6, d) subevents as well. For each movie type in our data 
set, we have a set of intended events and subevents that should be detected. If a definition does not 
detect an intended event, we deem the error a false negative. If a definition detects an unintended 
event, we deem the error a false positive. For example, if a movie depicts a M0VE(a, 6, c, d) event, 
the intended events are M0VE(a, 5, c, d), PlCKUp(a, 6, c), and PUTD0WN(a, 6, c). If the definition 
for pick up detects the occurrence of PickUp(c, 6, a) and PickUp(6, a, c), but not PlCKUp(a, 6, c), 
it will be charged two false positives as well as one false negative. We evaluate our definitions in 
terms of false positive and negative rates as describe below. 

6.2 Experimental Procedure 

For each event type, we evaluate the A;-AMA learning algorithm using a leave-one-movie-out cross- 
validation technique with training-set sampling. The parameters to our learning algorithm are k 
and the degree D of negative information used. The value of D is either P, for positive propositions 
only, BN, for boundary negation, or N, for fall negation. The parameters to our evaluation procedure 
include the target event type E and the training-set size N. Given this information, the evaluation 
proceeds as follows: For each movie M (the held-out movie) from the 210 movies, apply the k- 
AMA learning algorithm to a randomly drawn training sample of N movies from the 30 movies of 
event type E (or 29 movies if M is one of the 30). Use Leonard to detect all occurrences of the 
learned event definition in M. Based on E and the event type of M, record the number of false 
positives and false negatives in M, as detected by LEONARD. Let FP and FN be the total number 
of false positives and false negatives observed over all 210 held-out movies respectively. Repeat the 
entire process of calculating FP and FN 10 times and record the averages as FP and FN.^^ 

Since some event types occur more frequently in our data than others because simpler events 
occur as subevents of more complex events but not vice versa, we do not report FP and FN directly. 
Instead, we normalize FP by dividing by the total number of times LEONARD detected the target 
event correctly or incorrectly within all 210 movies and we normalize FN by dividing by the total 

12. The source code and all of the data used for these experiments are available as Online Appendix 1, and also from 
ftp://ftp.ecn. pur due . edu/qobi/ ama . tar . Z. 

13. While we did not record the times for our experiments, the system is fast enough to give live demos when N = 29 
and A: = 3 with boundary negation, giving the best results we show here (though we don't typically record 29 training 
videos in a live demo for other reasons). Some of the less favorable parameter settings (particularly k = 4 and full 
negation) can take a (real-time) hour or so. 
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number of correct occurrences of the target event within all 210 movies (i.e., the human assessment 
of the number of occurrences of the target event). The normaUzed value of FP estimates the prob- 
abihty that the target event did not occur given that it was predicted to occur, while the normaUzed 
value of FN estimates the probability that the event was not predicted to occur given that it did 
occur. 

6.3 Results 

To evaluate our A;-AMA learning approach, we ran leave-one-movie-out experiments, as described 
above, for varying k, D, and N. The 210 example movies were recorded with color-coded objects to 
provide complete object-correspondence information. We compared our learned event definitions to 
the performance of two sets of hand-coded definitions. The first set HDi of hand-coded definitions 
appeared in Siskind (2001). In response to subsequent deeper understanding of the behavior of 
Leonard's model-reconstruction methods, we manually revised these definitions to yield another 
set HD2 of hand-coded definitions that gives a significantly better FN performance at some cost 
in FP performance. Appendix C gives the event definitions in HDi and HD2 along with a set of 
machine-generated definitions, produced by the A;-AMA learning algorithm, given all training data 
for A; = 30 and D = BN. 

6.3.1 Object Correspondence 

To evaluate our algorithm for finding object correspondences, we ignored the correspondence in- 
formation provided by color coding and applied the algorithm to all training models for each event 
type. The algorithm selected the correct correspondence for all 210 training models. Thus, for this 
data set, the learning results when no correspondence information is given will be identical to those 
where the correspondences are manually provided, except that, in the first case, the rules will not 
specify particular object roles, as discussed in section 5.3.2. Since our evaluation procedure uses 
role information, the rest of our experiments use the manual correspondence information, provided 
by color-coding, rather than computing it. 

While our correspondence technique was perfect in these experiments, it may not be suited to 
some event types. Furthermore, it is likely to produce more errors as noise levels increase. Since 
correspondence errors represent a form of noise and our learner makes no special provisions for 
handling noise, the results are likely to be poor when such errors are common. For example, in the 
worst case, it is possible for a single extremely noisy example to cause the the LGG to be trivial (i.e., 
the formula true). In such cases, we will be forced to improve the noise tolerance of our learner. 

6.3.2 Varying k 

The first three rows of Table 1 show the FP and FN values for all 7 event types for k G {2, 3, 4}, 
TV = 29 (the maximum), and D = BN. Similar trends were found for D = P and D = N. The 
general trend is that, as k increases, FP decreases or remains the same and FN increases or remains 
the same. Such a trend is a consequence of our A;-cover approach. This is because, as k increases, 
the A,-AMA language contains strictly more formulas. Thus for ki > k2, the A;i-cover of a formula 
will never be more general than the A;2-cover. This strongly suggests, but does not prove, that FP 
will be non-increasing with k and FN will be non-decreasing with k. 

Our results show that 2-AMA is overly general for put down and assemble, i.e. it gives high 
FP. In contrast, 3-AMA achieves FP = for each event type, but pays a penalty in FN compared 
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Table 1: FP and FN for learned definitions, varying both k and D, and for hand-coded definitions. 



to 2-AMA. Since 3-AMA achieves FP = 0, there is likely no advantage in moving to A;-AMA for 
A; > 3. That is, the expected result is for FN to become larger. This effect is demonstrated for 
4-AMA in the table. 

6.3.3 Varying Z) 

Rows four through six of Table 1 show FP and FN for all 7 event types for D e {P, BN, N}, N = 29, 
and k = 3. Similar trends were observed for other values of k. The general trend is that, as the 
degree of negative information increases, the learned event definitions become more specific. In 
other words, FP decreases and FN increases. This makes sense since, as more negative information 
is added to the training models, more specific structure can be found in the data and exploited by 
the A;-AMA formulas. We can see that, with D = P, the definitions for pick up and put down are 
overly general, as they produce high FP. Alternatively, with D = N, the learned definitions are 
overly specific, giving FP = 0, at the cost of high FN. In these experiments, as well as others, we 
have found that D = BN yields the best of both worlds: FP = for all event types and lower FN 
than achieved with D = 'H. 

Experiments not shown here have demonstrated that, without negation for pick up and put down, 
we can increase k arbitrarily, in an attempt to specialize the learned definitions, and never signif- 
icantly reduce FP. This indicates that negative information plays a particularly important role in 
constructing definitions for these event types. 
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6.3.4 Comparison to Hand-Coded Definitions 

The bottom two rows of table 1 show the results for HDi and HD2. We have not yet attempted to 
automatically select the parameters for learning (i.e. k and D). Rather, here we focus on comparing 
the hand-coded definitions to the parameter set that we judged to be best performing across all event 
types. We believe, however, that these parameters could be selected reliably using cross-validation 
techniques appUed to a larger data set. In that case, the parameters would be selected on a per- 
event-type basis and would Ukely result in an even more favorable comparison to the hand-coded 
definitions. 

The results show that the learned definitions significantly outperform HDi on the current data 
set. The HDi definitions were found to produce a large number of false negatives on the current 
data set. Notice that, although HD2 produces significantly fewer false negatives for all event types, 
it produces more false positives for pick up and put down. This is because the hand definitions 
utilize pick up and put down as macros for defining the other events. 

The performance of the learned definitions is competitive with the performance of HD2. The 
main differences in performance are: (a) for pick up and put down, the learned and HD2 definitions 
achieve nearly the same FN but the learned definitions achieve FP = whereas HD2 has significant 
FP, (b) for unstuck and disassemble, the learned definitions perform moderately worse than HD2 
with respect to FN, and (c) the learned definitions perform significantly better than HD2 on assemble 
events. 

We conjecture that further manual revision could improve HD2 to perform as well as (and per- 
haps better than) the learned definitions for every event class. Nonetheless, we view this experiment 
as promising, as it demonstrates that our learning technique is able to compete with, and sometimes 
outperform, significant hand-coding efforts by one of the authors. 

6.3.5 Varying N 

It is of practical interest to know how training-set size affects our algorithm's performance. For this 
application, it is important that our method work well with fairly small data sets, as it can be tedious 
to collect event data. Table 2 shows the FN of our learning algorithm for each event type, as N is 
reduced from 29 to 5. For these experiments, we used A; = 3 and D = BN. Note that FP = 
for all event types and all N and hence is not shown. We expect FN to increase as N is decreased, 
since, with specific-to-general learning, more data yields more-general definitions. Generally, FN 
is flat for N > 20, increases slowly for 10 < < 20, and increases abruptly for 5 < iV < 10. We 
also see that, for several event types, FN decreases slowly, as N is increased from 20 to 29. This 
indicates that a larger data set might yield improved results for those event types. 

6.3.6 Perspicuity of Learned Definitions 

One motivation for using a logic-based event representation is to support perspicuity — ^in this respect 
our results are mixed. We note that perspicuity is a fuzzy and subjective concept. Realizing this, 
we will say that an event definition is perspicuous if most humans with knowledge of the language 
would find the definition to be "natural." Here, we do not assume the human has a detailed knowl- 
edge of the model-reconstruction process that our learner is trying to fit. Adding that assumption 
would presumably make the definitions qualify as more perspicuous, as many of the complex fea- 
tures of the learned definitions appear in fact to be due to idiosyncrasies of the model-reconstruction 
process. In this sense, we are evaluating the perspicuity of the output of the entire system, not just 
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of the learner itself, so that a key route to improving perspicuity in this sense would be to improve 
the intuitive properties of the model-reconstruction output without any change to the learner. 

While the learned and hand-coded definitions are similar with respect to accuracy, typically the 
learned definitions are much less perspicuous. For our simplest event types, however, the learned 
definitions are arguably perspicuous. Below we look at this issue in more detail. Appendix C gives 
the hand-coded definitions in HDi and HD2 along with a set of machine-generated definitions. The 
learned definitions correspond to the output of our A;-AMA learner when run on all 30 training 
movies from each event type with A; = 3 and D = BN (i.e., our best performing configuration with 
respect to accuracy). 

Perspicuous Definitions. The PickUp(x, y, z) and PutDown(2;, y, z) definitions are of partic- 
ular interest here since short state sequences appear adequate for representing these event types — 

thus, we can hope for perspicuous 3-AMA definitions. In fact, the hand-coded definitions in- 
volve short sequences. Consider the hand-coded definitions of PlCKUp(a;, 2;) — the definitions 
can roughly be viewed as 3-MA timelines of the form begin;trans;end}^ State begin asserts facts 
that indicate y is on 2; and is not being held by x and end asserts facts that indicate y is being held by 
X and is not on z. State trans is intended to model the fact that Leonard's model-reconstruction 
process does not always handle the transition between begin and end smoothly (so the definition 
begin;end does not work well). We can make similar observations for PUTD0WN(a:;, y, z). 

Figure 15 gives the learned 3-AMA definitions of PlCKUP(a:;, y, z) and PUTD0WN(a:;, y, z) — 
the definitions contain six and two 3-MA timelines respectively. Since the definitions consists of 
multiple parallel timelines, they may at first not seem perspicuous. However, a closer examination 
reveals that, in each definition, there is a single timeline that is arguably perspicuous — we have 
placed these perspicuous timelines at the beginning of each definition. The perspicuous timelines 
have a natural begin;trans;end interpretation. In fact, they are practically equivalent to the definitions 
of PiCKUp(2;, y, z) and PUTDoWN(a:, y, z) in HD2.^^ 

With this in mind, notice that the HD2 definitions are overly general as indicated by significant 
false positive rates. The learned definitions, however, yield no false positives without a significant 
increase in false negatives. The learned definitions improve upon HD2 by essentially specializing 
the HD2 definitions (i.e., the perspicuous timelines) by conjoining them with the non-perspicuous 
timelines. While these non-perspicuous timelines are often not intuitive, they capture patterns in the 
events that help rule out non-events. For example, in the learned definition of PlCKUp(rE, y, 2) some 
of the non-perspicuous timelines indicate that Attached (y, z) is true during the transition period 
of the event. Such an attachment relationship does not make intuitive sense. Rather, it represents a 
systematic error made by the model reconstruction process for pick up events. 

In summary, we see that the learned definitions of PlCKUp(a;, y, z) and PUTD0WN(a;, y, z) each 
contain a perspicuous timeline and one or more non-perspicuous timelines. The perspicuous time- 
lines give an intuitive definition of the events, whereas the non-perspicuous timelines capture non- 
intuitive aspects of the events and model reconstruction process that are important in practice. We 
note that, for experienced users, the primary difficulty of hand-coding definitions for LEONARD is 

14. Note that the event-logic definition for PlCKUP(a:, y, z) in HD2 is written in a more compact form than 3-MA, but 
this definition can be converted to 3-MA (and hence 3-AMA). Rather, HDi cannot be translated exactly to 3-MA 
since it uses disjunction — it is the disjunction of two 3-MA timelines. 

15. The primary difference is that the HD2 definitions contain more negated propositions. The learner only considers a 
proposition and its negation if the proposition is true at some point during the training movies. Many of the negated 
propositions in HD2 never appear positively, thus they are not included in the learned definitions. 
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to determining which non-perspicuous properties must be included. Typically this requires many 
iterations of trial and error. Our automated technique can relieve the user of this task. Alternatively, 
we could view the system as providing guidance for this task. 

Large Definitions. The Stack(w, x, y, z) and Unstack(w, x, y, z) events are nearly identical 
to put down and pick up respectively. The only difference is that now we are picking up from and 
putting down onto a two block (rather than single block) tower (i.e., composed of blocks y and z). 
Thus, here again we might expect there to be perspicuous 3-AMA definitions. However, we see that 
the learned definitions for Stack(w, x, y, z) and Unstack(w, y, z) in Figures 16 and 17 involve 
many more timelines than those for V\C¥JJ'P[w,x,y) and PutDown(i(;, a;, y). Accordingly, the 
definitions are quite overwhelming and much less perspicuous. 

Despite the large number of timelines, these definitions have the same general structure as those 
for pick up and put down. In particular, they each contain a distinguished perspicuous timeUne, 
placed at the beginning of each definition, that is conjoined with many non-perspicuous timelines. 
It is clear that, as above, the perspicuous timelines have a natural begin;trans;end interpretation 
and, again, they are very similar to the definitions in HD2. In this case, however, the definitions 
in HD2 are not overly general (committing no false positives). Thus, here the inclusion of the 
non-perspicuous timehnes has a detrimental effect since they uimecessarily specialize the definition 
resulting in more false negatives. 

We suspect that a primary reason for the large number of non-perspicuous timelines relative 
to the definitions of pick up and put down stems from the increased difficulty of constructing 
force-dynamic models. The inclusion of the two block tower in these examples causes the model- 
reconstruction process to produce more unintended results, particularly during the transition periods 
of Stack and Unstack. The result is that often many unintuitive and physically incorrect patterns 
involving the three blocks and the hand are produced during the transition period. The learner 
captures these patterns roughly via the non-perspicuous timelines. It is likely that generalizing the 
definitions by including more training examples would filter out some of these timelines, making the 
overall definition more perspicuous. Alternatively, it is of interest to consider pruning the learned 
definitions. A straightforward way to do this is to generate negative examples. Then with these, 
we could remove timelines (generalizing the definition) that do not contribute toward rejecting the 
negative examples. It is unclear how to prune definitions without negative examples. 

Hierarchical Events. Move(w, x, y, z). Assemble (t;;, x, y, z), and Disassemble x, y, z) 
are inherently hierarchical, being composed of the four simpler event types. The hand-coded defi- 
nitions leverage this structure by utilizing the simpler definitions as macros. In this light, it should 
be clear that, when viewed non-hierarchically, (as our learner does) these events involve relatively 
long state sequences. Thus, 3-AMA is not adequate for writing down perspicuous definitions. In 
spite of this representational shortcoming, our learned 3-AMA definitions perform quite well. This 
performance supports one of our arguments for using AMA from section 3.2. Namely, given that it 
is easier to find short rather than long sequences, a practical approach to finding definitions for long 
events is to conjoin the short sequences within those events. Examining the timelines of the learned 
3-AMA definitions reveals what we might expect. Each timeUne captures an often understandable 
property of the long event sequence, but the conjunction of those timelines cannot be considered 
to be a perspicuous definition. A future direction is to utilize hierarchical learning techniques to 
improve the perspicuity of our definitions while maintaining accuracy. 
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N 



pick up put down stack unstack move assemble disassemble 



29 
25 
20 
15 
10 
5 



0.0 0.20 0.45 0.10 0.03 0.07 0.10 

0.0 0.20 0.47 0.16 0.05 0.09 0.10 

0.01 0.21 0.50 0.17 0.08 0.12 0.12 

0.01 0.22 0.53 0.26 0.14 0.20 0.16 

0.07 0.27 0.60 0.36 0.23 0.32 0.26 

0.22 0.43 0.77 0.54 0.35 0.57 0.43 



Table 2: FN for 



k = 3, D = BN, and various values of N. 



We note, however, that, at some level, the learned definition of MOVE (to, x, y, z) given in Fig- 
ure 18 is perspicuous. In particular, the first 3-MA timeline is naturally interpreted as giving the 
pre- and post-conditions for a move action. That is, initially x is supported by y and the hand w is 
empty and finally x is supported by z and the hand w is empty. Thus, if all we care about is pre- 
and post-conditions, we might consider this timeline to be perspicuous. The remaining timelines in 
the definition capture pieces of the internal event structure such as facts indicating that x is moved 
by the hand. A weaker case can be made for assemble and disassemble. The first timeline in each 
of the learned definitions in Figures 19 and 20 can be interpreted as giving pre- and post-conditions. 
However, in these cases, the pre(post)-conditions for assemble{disassemble) are quite incomplete. 
The incompleteness is due to the inclusion of examples where the model-reconstruction process did 
not properly handle the initial(final) moments. 

7. Related Work 

Here we discuss two bodies of related work. First, we present previous work in visual event recogni- 
tion and how it relates to our experiments here. Second, we discuss previous approaches to learning 
temporal patterns from positive data. 

7.1 Visual Event Recognition 

Our system is unique in that it combines positive-only learning with a temporal, relational, and 
force-dynamic representation to recognize events from real video. Prior work has investigated vari- 
ous subsets of the features of our system — but, to date, no system has combined all of these pieces 
together. Incorporating any one of these pieces into a system is a significant endeavor. In this re- 
spect, there are no competing approaches to directly compare our system against. Given this, the 
following is a representative list of systems that have common features with ours. It is not meant to 
be comprehensive and focuses on pointing out the primary differences between each of these sys- 
tems and ours, as these primary differences actually render these systems only very loosely related 
to ours. 

Borchardt (1985) presents a representation for temporal, relational, force-dynamic event defi- 
nitions but these definitions are neither learned nor applied to video. Regier (1992) presents tech- 
niques for learning temporal event definitions but the learned definitions are neither relational, force 
dynamic, nor applied to video. In addition the learning technique is not truly positive-only — rather, 
it extracts implicit negative examples of an event type from positive examples of other event types. 
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Yamoto, Ohya, and Ishii (1992), Brand and Essa (1995), Siskind and Morris (1996), Brand, Oliver, 
and Pentland (1997), and Bobick and Ivanov (1998) present techniques for learning temporal event 
definitions from video but the learned definitions are neither relational nor force dynamic. Pinhanez 
and Bobick (1995) and Brand (1997a) present temporal, relational event definitions that recognize 
events in video but these definitions are neither learned nor force dynamic. Brand (1997b) and Mann 
and Jepson (1998) present techniques for analyzing force dynamics in video but neither formulate 
event definitions nor apply these techniques to recognizing events or learning event definitions. 

7.2 Learning Temporal Patterns 

We divide this body of work into three main categories: temporal data mining, inductive logic 
programming, and finite-state-machine induction. 

Temporal Data Mining. The sequence-mining literature contains many general-to-specific ("lev- 
elwise") algorithms for finding frequent sequences (Agrawal & Srikant, 1995; Mannila, Toivonen, 
& Verkamo, 1995; Kam & Fu, 2000; Cohen, 2001; Hoppner, 2001). Here we explore a specific-to- 
general approach. In this previous work, researchers have studied the problem of mining temporal 
patterns using languages that are interpreted as placing constraints on partially or totally ordered 
sets of time points, e.g., sequential patterns (Agrawal & Srikant, 1995) and episodes (Mannila et al., 
1995). These languages place constraints on time points rather than time intervals as in our work 
here. More recently there has been work on mining temporal patterns using interval-based pattern 
languages (Kam & Fu, 2000; Cohen, 2001; Hoppner, 2001). 

Though the languages and learning frameworks vary among these approaches, they share two 
central features which distinguish them from our approach. First, they all typically have the goal 
of finding all frequent patterns (formulas) within a temporal data set — our approach is focused 
on finding patterns with a frequency of one (covering all positive examples). Our first learning 
application of visual-event recognition has not yet required us to find patterns with frequency less 
than one. However, there are a number of ways in which we can extend our method in that direction 
when it becomes necessary (e.g., to deal with noisy training data). Second, these approaches all 
use standard general-to-specific level-wise search techniques, whereas we chose to take a specific- 
to-general approach. One direction for future work is to develop a general-to-specific level-wise 
algorithm for finding frequent MA formulas and to compare it with our specific-to-general approach. 
Another direction is to design a level-wise version of our specific-to-general algorithm — where for 
example, the results obtained for the k-AMA LGG can be used to more efficiently calculate the 
{k + 1)-AMA LGG. Whereas a level- wise approach is conceptually straightforward in a general-to- 
specific framework it is not so clear in the specific-to-general case. We are not familiar with other 
temporal data-mining systems that take a specific-to-general approach. 

First-Order Learning In Section 3.3, we pointed out difficulties in using existing first-order 
clausal generahzation techniques for learning AMA formulas. In spite of these difficulties, it is still 
possible to represent temporal events in first-order logic (either with or without capturing the AMA 
semantics precisely) and to apply general-purpose relational learning techniques, e.g., inductive 
logic programming (ILP) (Muggleton & De Raedt, 1994). Most LLP systems require both positive 
and negative training examples and hence are not suitable for our current positive-only framework. 
Exceptions include GOLEM (Muggleton & Feng, 1992), Progol (Muggleton, 1995), and Clau- 
DIEN (De Raedt & Dehaspe, 1997), among others. While we have not performed a full evaluation 
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Inputs 


Subsumption 


Semantic AMA LGG 


Syntactic AMA LGG 


Semantic Syntactic 


Lower Upper Size 


Lower Upper Size 


MA 
AMA 


P P 
coNP-complete P 


P coNP EXP 
coNP NEXP 2-EXP? 


P coNP EXP 
P coNP EXP 



Table 3: Complexity Results Summary. The LGG complexities are relative to input plus output size. 
The size column reports the worst-case smallest correct output size. The "?" indicates a 
conjecture. 

of these systems, our early experiments in the visual-event recognition domain confirmed our belief 
that horn clauses, lacking special handling of time, give a poor inductive bias. In particular, many of 
the learned clauses find patterns that simply do not make sense from a temporal perspective and, in 
turn, generaUze poorly. We beUeve a reasonable alternative to our approach may be to incorporate 
syntactic biases into ILP systems as done, for example, in Cohen (1994), Dehaspe and De Raedt 
(1996), Klingspor, Morik, and Rieger (1996). In this work, however, we chose to work directly in a 
temporal logic representation. 

Finite-State Machines Finally, we note there has been much theoretical and empirical research 
into learning finite-state machines (FSMs) (Angluin, 1987; Lang, Pearlmutter, & Price, 1998). We 
can view FSMs as describing properties of strings (symbol sequences). In our case, however, we are 
interested in describing sequences of propositional models rather than just sequences of symbols. 
This suggests learning a type of "factored" FSM where the arcs are labeled by sets of propositions 
rather than by single symbols. Factored FSMs may be a natural direction in which to extend the 
expressiveness of our current language, for example by allowing repetition. We are not aware of 
work concerned with learning factored FSMs; however, it is likely that inspiration can be drawn 
from symbol-based FSM-leaming algorithms. 

8. Conclusion 

We have presented a simple logic for representing temporal events called AMA and have shown 
theoretical and empirical results for learning AMA formulas. Empirically, we've given the first 
system for learning temporal, relational, force-dynamic event definitions from positive-only input 
and we have applied that system to learn such definitions from real video input. The resulting 
performance matches that of event definitions that are hand-coded with substantial effort by human 
domain experts. On the theoretical side. Table 3 summarizes the upper and lower bounds that 
we have shown for the subsumption and generalization problems associated with this logic. In 
each case, we have provided a provably correct algorithm matching the upper bound shown. The 
table also shows the worst-case size that the smallest LGG could possibly take relative to the input 
size, for both AMA and MA inputs. The key results in this table are the polynomial-time MA 
subsumption and AMA syntactic subsumption, the coNP lower bound for AMA subsumption, the 
exponential size of LGGs in the worst case, and the apparently lower complexity of syntactic AMA 
LGG versus semantic LGG. We described how to build a learner based on these results and appUed 
it to the visual-event learning domain. To date, however, the definitions we learn are neither cross- 
modal nor perspicuous. And while the performance of the learned definitions matches that of hand- 
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coded ones, we wish to surpass hand coding. In the future, we intend to address cross-modaUty by 
applying our learning technique to the planning domain. We also believe that addressing perspicuity 
will lead to improved performance. 
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Appendix A. Internal Positive Event Logic 

Here we give the syntax and semantics for an event logic called Internal Positive Event Logic 
(IPEL). This logic is used in the main text only to motivate our choice of a small subset of this 
logic, AMA, by showing, in Proposition 4, that AMA can define any set of models that IPEL can 
define. 

An event type (i.e., set of models) is said to be internal if whenever it contains any model 
M. = (M, I), it also contains any model that agrees with M on truth assignments M[i] where « G /. 
Full event logic allows the definition of non-internal events, for example, the formula * = 0<P 
is satisfied by (M, /) when there is some interval /' entirely preceding I such that P is satisfied 
by {M,I'), thus * is not internal. The apphcations we are considering do not appear to require 
non-internal events, thus we currently only consider events that are internal. 

Call an event type positive if it contains the model M = (M, [1, 1]) where M(l) is the truth 
assignment assigning all propositions the value true. A positive event type cannot require any propo- 
sition to be false at any point in time. 

IPEL is a fragment of full propositional event logic that can only describe positive internal 
events. We conjecture, but have not yet proven, that all positive internal events representable in the 
full event logic of Siskind (2001) can be represented by some IPEL formula. Formally, the syntax 
of IPEL formulas is given by 

E ::= true I prop I El V E2 I ^ E\ I E\ f\ji -E'2; 

where the E-i are IPEL formulas, prop is a primitive proposition (sometimes called a primitive event 
type), i? is a subset of the thirteen Allen interval relations {s,f,d,b,m,o,=,si,fi,di,bi,ai,oi} (Allen, 
1983), and R' is a subset of the restricted set of Allen relations {s,f,d,=}, the semantics for each 
Allen relation is given in Table 4. The difference between IPEL syntax and that of full propositional 
event logic is that event logic allows for a negation operator, and that, in full event logic, R' can 
be any subset of all thirteen Allen relations. The operators A and ; used to define AMA formulas 
are merely abbreviations for the IPEL operators A{=| and f\{m} respectively, so AMA is a subset of 
IPEL (though a distinguished subset as indicated by Proposition 4). 

Each of the thirteen Allen interval relations are binary relations on the set of closed natural- 
number intervals. Table 4 gives the definitions of these relations, defining [mi, 7712] r [ni,n2] for 
each Allen relation r. Satisfiability for IPEL formulas can now be defined as follows. 
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inverse 


['"-1 ! "'■2J 


s 


["■1 5 "■2J 


stcirts 


I f v\ ' ''f'Z ' ''Z 


si 


[mi,m2] 


f 


[^^l,^^2] 


finishes 


mi < ni and m2 = n2 


fi 


[mi,m2] 


d 


[rai,n2] 


during 


mi > ni and m2 < ri2 


di 


[mi, 7712] 


b 


['^i,'ra2] 


before 


m2 < ni 


bi 


[mi,m2] 


m 


W2] 


meets 


m2 = ni or m2 + 1 = ni 


mi 


[mi,m2] 





[^^l,^^2] 


overlaps 


mi < ni < m2 < ^2 


oi 


[mi,m2] 




[ni,n2] 


equals 


mi = ni and m2 = n2 





Table 4: The Thirteen Allen Relations (adapted to our semantics). 



• true is satisfied by every model. 

• prop is satisfied by model (M, /) iff M[x] assigns prop true for every x € I. 

• Ei\/ E2 is satisfied by a model M iff M satisfies Ei or M satisfies £^2- 

• OrE is satisfied by model (M, /) iff for some r e R there is an interval /' such that I' r I 
and (M, /') satisfies E. 

• El Ar E2 is satisfied by model (M, /) iff for some r e R there exist intervals Ii and I2 such 
that h r I2, Span(/i, /2) = I and both (M, Ii) satisfies Ei and (M, 72) satisfies £;2. 

where prop is a primitive proposition, E and are IPEL formulas, i? is a set of Allen relations, and 
Span(/i, 12) is the minimal interval that contains both Ii and I2. From this definition, it is easy to 
show, by induction on the number of operators and connectives in a formula, that all IPEL formulas 
define internal events. One can also verify that the definition of satisfiabiUty given earlier for AMA 
formulas corresponds to the one we give here. 

Appendix B. Omitted Proofs 

Lemma 1. For any MA timeline $ and any model M., if M. satisfies $ then there is a witnessing 
interdigitation for MAP{M) < 

Proof: Assume that M = {M,I) satisfies the MA timeline $ = si;...;s„, and let = 
MAP{M). It is straightforward to argue, by induction on the length of that there exists a mapping 
V' from states of $ to sub-intervals of I, such that 

• for any i G V'{s), M[i] satisfies s, 

• V'{si) includes the initial time point of I, 

• V{sn) includes the final time point of I, and 

• for any i € [l,n — 1], we have V'{si) meets F'(sj+i) (see Table 4). 
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Let V be the relation between states s G $ and members i G / that is true when i G V'{s). Note 
that the conditions on V' ensure that every ,s G $ and every i £ I appear in some tuple in V (not 
necessarily together). Below we use V to construct a witnessing interdigitation W. 

Let R be the total, one-to-one, onto function from time-points in / to corresponding states in 
noting that has one state for each time-point in /, as <&' = MAP((M, /)). Note that R preserves 
ordering in that, when i < j, R{i) is no later than R{j) in Let W be the composition Vo Rof 
the relations V and it!. 

We show that is an interdigitation. We first show that each state from $ or appears in a 
tuple in W, so W is piecewise total. States from $ must appear, trivially, because each appears in a 
tuple of V, and R is total. States from appear because each i g / appears in a tuple of V, and R 
is onto the states of 

It now suffices to show that for any states s before t from W{s, s') and W{t, t') implies that 
s' is no later than t' in so that W is simultaneously consistent. The conditions defining V' above 
imply that every number in j G V{s) is less than or equal to every j G V{t). The order-preservation 
property of R, noted above, then implies that every state s' G Fo R[s) is no later than any state 
i' G Fo R[t) in as desired. So is an interdigitation. 

We now argue that W witnesses < Consider s G $ and i G such that W{s^ t). By the 
construction of W , there must be « G V'{s) for which t is the «'th state of Since $' = MAP(A^), 
it follows that t is the set of true propositions in M[i]. Since i G V'{s), we know that M[i\ satisfies 
s. It follows that s C i, and so i < s. □ 

Lemma 3. For any E G IP EL, if model M. embeds a model that satisfies E then M. satisfies E. 

Proof: Consider the models M = {M,I) and M' = {M',I') such that M embeds M', let 
$ = MAP(A^) and = MAP(A^'). Assume that E G DPEL is satisfied by M', we will show that 
E is also satisfied by M- 

We know from the definition of embedding that $ < <!>' and thus there is a witnessing interdig- 
itation IF for $ < by Proposition 2. We know there is a one-to-one correspondence between 
numbers in I (/') and states of $ ($') and denote the state in $ ($') corresponding to 2 G / («' G /') 
as Si (ti'). This correspondence allows us to naturally interpret IF as a mapping V from subsets of 
I' to subsets of / as follows: for /{ C /', V{I[) equals the set of all i G / such that for some i' G /{, 
Si co-occurs with ij/ in W. We will use the following properties of V, 

1. If /{ is a sub-interval of /', then is a sub-interval of I. 

2. If is a sub-interval of /', then (M, V{I[)) embeds (M', ). 

3. If I'l and I2 are sub-intervals of /', and r is an Allen relation, then I[rl2 iff V{I[)rV{l2)- 

4. If /{ and 7^ are sub-intervals of then F(Span(/{ , 7^)) = Span(F(/;), 

5. V{I') = I. 

We sketch the proofs of these properties. 1) Use induction on the length of /{, with the 

definition of interdigitation. 2) Since V{I'i) is an interval, MAP({M, F(7{))) is well defined. 
MAP({M, V(I[))) < MAP((M',/{)) follows from the assumption that M embeds M'. 3) From 
Appendix A, we see that all Allen relations are defined in terms of the < relation on the natural 
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number endpoints of the intervals. We can show that V preserves < (but not <) on singleton sets 
(i.e., every member of V{{i'}) is < every member of V{{j'}) when i' < j') and that V com- 
mutes with set union. It follows that V preserves the Allen interval relations. 4) Use the fact that 
V preserves < in the sense just argued, along with the fact that Span(7{, /g) depends only on the 
minimum and maximum numbers in /{ and 5) Follows from the definition of interdigitation and 
the construction of V. 

We now use induction on the number of operators and connectives in E to prove that, if M' 
satisfies E, then so must M. The base case is when E = prop, where prop is a primitive proposition, 
or true. Since M' satisfies E, we know that prop is true in all M'[x'] for x' G /'. Since W witnesses 
$ < we know that, if prop is true in M'[2;], then prop is true in all M[2;], where x G V{x'). 
Therefore, since V{I') = I, prop is true for all M'[2;], where x £ I, hence M-' satisfies E. 

For the inductive case, assume that the claim holds for IPEL formulas with fewer than N oper- 
ators and connectives — let Ei,E2 be two such formulas. When E = Ei\J E2, the claim trivially 
holds. When E = OrEi, R must be a subset of the set of relations {s,f,d,=}. Notice that E can 
be written as a disjunction of OrEi formulas, where r is a single Allen relation from R. Thus, it 
suffices to handle the case where i? is a single Allen relation. Suppose E = Ojsj-E'i- Since M' 
satisfies E, there must be a sub-interval /{ of /' such that /{ S /' and (M', /{) satisfies Ei. Let 
Ii = V{I[), we know from the properties of V that V{I') = I, and, hence, that Ji s /. Fur- 
thermore, we know that (M, Ji) embeds (M', /{), and, thus, by the inductive hypothesis, (M, Ii) 
satisfies Ei. Combining these facts, we get that E is satisfied by M. Similar arguments hold for 
the remaining three Allen relations. Finally, consider the case when E = Ei Ar E2, where R can 
be any set of Allen relations. Again, it suffices to handle the case when i? is a single Allen relation 
r. Since M' satisfies E = Ei Ar E2, we know that there are sub-intervals I[ and I2 of I' such that 
Span(/{ , = /{ r (M', /{) satisfies Ei, and (M', /^) satisfies E2. From these facts, and 
the properties of V, it is easy to verify that M satisfies E. □ 

Lemma 5. Given an MA formula $ that subsumes each member of a set E of MA formulas, $ 
also subsumes some member <&' o/IG(S). Dually, when $ is subsumed by each member ofT,, we 
have that $ is also subsumed by some member o/IS(S). In each case, the length of can be 
bounded by the size ofH. 

Proof: We prove the result for IG(S). The proof for IS(S) follows similar fines. Let E = 
{$1, . . . , $ = si;...; Sm, and assume that for each 1 < « < n, $j < From Proposi- 
tion 2, for each i, there is a witnessing interdigitation Wi for $j < We will combine the Wi 
into an interdigitation of E, and show that the corresponding member of IG(E) is subsumed by 

To construct an interdigitation of S, first notice that, for each Sj, each Wi specifies a set of 
states (possibly a single state but at least one) from $j that all co-occur with Sj. Furthermore, since 
Wi is an interdigitation, it is easy to show that this set of states corresponds to a consecutive sub- 
sequence of states from $j — let be the MA timeline corresponding to this subsequence. Now 
let Sj = {<I>j.j I 1 < i < n}, and aj be any interdigitation of Ej. We now take I to be the union of 
all aj, for 1 < j < m. We show that / is an interdigitation of E. Since each state s appearing in E 
must co-occur with at least one state Sj in $ in at least one Wi, s will be in at least one tuple of aj, 
and, hence, be in some tuple of I — so I is piecewise total. 

Now, define the restriction of / to components i and j, with i < j, to be the relation given 
by taking the set of all pairs formed by shortening tuples of / by omitting all components except 
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the i'th and the j'th. Likewise define for each k. To show / is an interdigitation, it now suffices 
to show that each J*'^ is simultaneously consistent. Consider states Sj and Sj from timelines $j and 

respectively, such that P'^{si, Sj). Suppose that ti occurs after Si in and for some tj € 
P'^{ti,tj) holds. It suffices to show that Sj is no later than tj in $j. Since P'^{si, Sj) and P'^{ti,tj), 
we must have a^'-' (sj, Sj) and a^f (tj, tj), respectively, for some k and A;'. We know k < k' because 
Si is before t-i in and W-i is simultaneously consistent. If A; = A;', then Sj is no later than tj in 
because must be simultaneously consistent, being an interdigitation. Otherwise, k < k'. Then Sj 
is no later than tj in $j, as desired, because Wj is simultaneously consistent. So / is simultaneously 
consistent, and an interdigitation of E. 

Let be the member of IG(S) corresponding to I. We now show that < We know that 
each state s' G is the intersection of the states in a tuple of some aj — we say that s' derives from 
aj. Consider the interdigitation /' between $ and where P{sj, s'), for Sj € $ and s' € if and 
only if s' derives from aj. P is piecewise total, as every tuple of /' derives from some aj, and no aj 
is empty. P is simultaneously consistent because tuples of /' deriving from later must be later in 
the lexicographic ordering of J, given the simultaneous consistency of the Wk interdigitations used 
to construct each aj. Finally, we know that Sj subsumes (i.e., is a subset of) each state in each tuple 
of aj, because each Wk is a witnessing interdigitation to < and, hence, subsumes (is a subset 
of) the intersection of those states. Therefore, if ,sj G <I> co-occurs with ,s' G <!'' in P we have that 
s' < Sj. Thus, /' is a witnessing interdigitation for $' < $, and by Proposition 2 we have <!'' < <I>. 

The size bound on follows, since, as pointed out in the main text, the size of any member of 
IG(S) is upper-bounded by the number of states in E. □ 

Lemma 8. Given MA timelines $i = si; . . . ; and $2 = ii! • • • ; in. there is a witnessing 
interdigitation for $1 < $2 iff there is a path in the subsumption graph S'G($i, ^2) from vi^i to 

Proof: Subsumption graph SG{^i,^2) is equal to {V, E) with V = {wjj \ I < i < m.l < j < n} 
and E = {{vij,Viiji) \ Si < tj, Sii < tji, i < i' < i + l,j < j' < j + 1}. Note that there is a 
correspondence between vertices and state tuples — with vertex Vij corresponding to (sj, tj). 

For the forward direction, assume that is a witnessing interdigitation for $1 < $2- We 
know that, if the states Sj and tj co-occur in W, then Sj < tj since W witnesses $1 < $2- The 
vertices corresponding to the tuples of W will be called co-occurrence vertices, and satisfy the 
first condition for belonging to some edge in E (that Sj < tj). It follows from the definition of 
interdigitation that both 1 and Vm,n are both co-occurrence vertices. Consider a co-occurrence 
vertex Vij not equal to and the lexicographically least co-occurrence vertex Vi'^i after Vij 

(ordering vertices by ordering the pair of subscripts). We show that i, j, i', and j' satisfy the 
requirements for {vij,Viiji) G E. If not, then either i' > i + 1 or j' > j + 1. If «' > « + 1, then 
there can be no co-occurrence vertex Vi^ij", contradicting that W is piecewise total. If j' > j + 1, 
then since W is piecewise total, there must be a co-occurrence vertex Vi"j^i: but if i" < i or 
i" > i', this contradicts the simultaneous consistency of W, and if i" = i, this contradicts the 
lexicographically least choice of Viiji. It follows that every co-occurrence vertex but Vm,n has an 
edge to another co-occurrence vertex closer in Manhattan distance to Wm,n> and thus that there is a 
path from 1 to Wm,n- 

For the reverse direction assume there is a path of vertices in SG{<^i, ^^2) from f i 1 to v-m.n 
given by, Vi^j^,Vi^j^, . . . ,Vi^j^ with ii = ji = 1, = m,js = n. Let W be the set of state 
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tuples corresponding to the vertices along this path. W must be simultaneously consistent with the 
$j orderings because our directed edges are all non-decreasing in the orderings. W must be 
piecewise total because no edge can cross more than one state transition in either $i or by the 
edge set definition. So is an interdigitation. Finally, the definition of the edge set E ensures 
that each tuple (sj, tj) in W has the property Si < tj, so that is a witnessing interdigitation for 
$1 < showing that $i < as desired. □ 

Lemma 10. Given some n, let * be the conjunction of the timelines 

n 

IJ { {PROPn ; Truei ; False i ; PROPn ) , {PROPn ; Falsei ; Truei ; PROPn ) } . 

2 = 1 

We have the following facts about truth assignments to the Boolean variables pi, . . . 

1. For any truth assignment A, PROPn] sa', PROPn is semantically equivalent to a member 

ofisin 

2. For each $ G IS(*) there is a truth assignment A such that $ < PROPn] sa; PROPn- 

Proof: To prove the first part of the lemma, we construct an interdigitation / of ^ such that the 
corresponding member of IS(\E') is equivalent to PROP„; sa; PR0P„. Intuitively, we construct I 
by ensuring that some tuple of / consists only of states of the form True^. or False/,^^ that agree with 
the truth assignment — the union of all the states in this tuple, taken by IS(\I/) will equal sa- Let 
/ = {To, Ti, T2, T3, T4} be an interdigitation of * with exactly five state tuples Tj. We assign the 
states of each timeline of ^ to the tuples as follows: 

1. For any k, such that 1 < A; < n and A{pk) is true, 

• for the timeUne si; S2; S3; S4 = Q; Trusk; Falssk; Q, assign each state Si to tuple Tj, 
and assign state si to Tq as well, and 

• for the timeline ,s'i; 62; S3; S4 = Q; Fa/se/t; Tritejt; Q, assign each state to tuple Tj_i, 
and state s'^ to tuple T4 as well. 

2. For any k, such that I < k < n and A{pk) is false, assign states to tuples as in item 1 while 
interchanging the roles of True^ and Falser. 

It should be clear that I is piecewise total and simultaneously consistent with the state orderings 
in ^, and so is an interdigitation. The union of the states in each of To, Ti, T3, and T4 is equal to 
PROP„, since PROP„ is included as a state in each of those tuples. Furthermore, we see that the 
union of the states in T2 is equal to sa- Thus, the member of IS(*) corresponding to / is equal to 
PROP„;PROP„;sa;PROP„;PROP„, which is semantically equivalent to PROP„; s^; PROP„, as 
desired. 

To prove the second part of the lemma, let $ be any member of IS(\1'). We first argue that 
every state in $ must contain either True^ or False^ for each 1 < A; < n. For any k, since * con- 
tains PROP„; True^; Falscjfc; PROP„, every member of IS(^) must be subsumed by PROP,,; True;;,; 
FalseA;;PROP„. So, $ is subsumed by PROP„; True/,; False/,; PROP„. But every state in PROP„; 
True/j ; False ;t ; PROPn contains either True;t or FalsCj^, implying that so does as desired. 
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Next, we claim that for each 1 < A; < n, either $ < TruCj^ or $ < False — i.e., either all states 
in <& include TruCj!,^^, or all states in # include Falsc/j (and possibly both). To prove this claim, assume, 
for the sake of contradiction, that, for some k, ^ ^ Truest and $ ^ False/j. Combining this assump- 
tion with our first claim, we see there must be states s and s' in $ such that s contains Trusk but 
not FalsBk, and s' contains False]- but not Trusk, respectively. Consider the interdigitation / of * 
that corresponds to $ as a member of IS(*). We know that ,s and s' are each equal to the union of 
states in tuples T and T', respectively, of I. T and T' must each include one state from each timeline 
si;52;s3;s4 = PROP„;TrueA;;FalseA;;PROP„ and s'^, s'2] s'^; s'^ = PROP„;FalseA;;TrueA;;PROP„. 
Clearly, since s does not include False^, T includes the states si and s^, and likewise T' includes 
the states S2 and s'l- It follows that I is not simultaneously consistent with the state orderings in 
si\S2\ S3; S4 and s'^; ■S3; ■S4, contradicting our choice of / as an interdigitation. This shows that 
either $ < Truejt or $ < False/^. 

Define the truth assignment A such that for all 1 < A; < n, A{pk) if and only if $ < TruCyt. 
Since,for each A, $ < Truc/j or $ < False^;, it follows that each state of $ is subsumed by 
SA- Furthermore, since $ begins and ends with PROP„, it is easy to give an interdigitation of 
$ and PROP„;sa;PROP„ that witnesses $ < PROP„; sa;PROP„. Thus, we have that $ < 
PROP„;s^;PROP„. □ 

Lemma 16. Let $1 and $2 be as given on page 402, in the proof of Theorem 17, and let ^ = 
/\IG({$i, $2})- For any whose timelines are a subset of those in * that omits some square 
timeline, we have ^ < 

Proof: Since the timelines in 'J'' are a subset of the timelines in ^i, we know that ^ < It remains 
to show that "if' ^ "if. We show this by constructing a timehne that is covered by but not by 

Let $ = Si; S2; . . . ; S2n 1 be a square timeUne in * that is not included in Recall that each 
Si is a single proposition from the proposition set P = {pij |l<«<n, I < j < n}, and that, 
for consecutive states s-i and Sj+i, if Sj = pij, then s^+i is either p-i+ij or p,j j+i. Define a new 
timeline $ = S2; S3; . . . ; S2n-2 with Si = {P — Sj). We now show that $ ^ $ (so that $ ^ and 
that, for any in * - {$}, ¥ < (so that ¥ < *')• 

For the sake of contradiction, assume that $ < $ — ^then there must be a interdigitation W 
witnessing $ < We show by induction on i that, for i > 2, W{si,Sj) implies j > i. For the 
base case, when i = 2, we know that S2 ^ S2, since S2 2 S2, and so W(s2,S2) is false, since 
W witnesses subsumption. For the inductive case, assume the claim holds for all i' < i, and that 
W{si,Sj). We know that Sj ^ Sj, and thus i / j. Because W is piecewise total, we must have 
W{si-i,Sji) for some j', and, by the induction hypothesis, we must have j' > i — I. Since W is 
simultaneously consistent with the and s^' state orderings, and i — 1 < i, we have j' < j. It 
follows that j > i as desired. Given this claim, we see that S2n-2 cannot co-occur in W with any 
state in contradicting the fact that W is piecewise total. Thus we have that $ ^ 

Let = s'l, . . . -.s'^be any timeline in ^ — {$}, we now construct an interdigitation that 
witnesses $ < Note that while $ is assumed to be square, need not be. Let j be the smallest 
index where Sj / s'j — since si = s'l = pi^i, and $ / we know that such a j must exist, and is 
in the range 2 < j < m. We use the index j to guide our construction of an interdigitation. Let W 
be an interdigitation of $ and with exactly the following co-occurring states (i.e., state tuples): 

I. For 1 < i < j — 1, Sj+i co-occurs with s^. 
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2. For j < i < m, Sj co-occurs with s'^. 

3. For j + 1 < i < 2n — 2,Si co-occurs with s'^^. 

It is easy to check that W is both piecewise total and simultaneously consistent with the state 
orderings in $ and and so is an interdigitation. We now show that W witnesses $ < by 
showing that all states in $ are subsumed by the states they co-occur with in W. For co-occurring 
states and s' corresponding to the first item above we have that .s' = ,Sj — this implies that ,s' 
is contained in Sj+i, giving that Sj+i < s'^. Now consider co-occurring states sj and from the 
second item above. Since $ is square, choose k and / so that = pk^i, we have that Sj is either 
Pk+i,i orpk^i+i. In addition, since Sj-i = s'j_^ we have that s'^ is either pk+i,i,pk,i+i orp^+i^^+i 
but that Sj / s'j. In any of these cases, we find that no state in after s'j can equal Sj — this follows 
by noting that the proposition indices never decrease across the timeline We therefore have 
that, for i > j, Sj < s^. Finally, for co-occurring states Sj and from item three above, we have 
Si < since s'^ = Pn,n, which is in all states of Thus, we have shown that for all co-occurring 
states in W, the state from $ is subsumed by the co-occurring state in Therefore, W witnesses 
¥ < which implies that ¥ < □ 

Lemma 26. For any model {M, I) e M and any ^ G AMA", ^ covers {M, I) iff F[^] covers 
T[{M,I)]. 

Proof: Recall that M is the set of models over propositions in the set P = {pi, . . . ,p„} and that 
we assume AMA~ uses only primitive propositions from P (possibly negated). We also have the 
set of propositions P = {pi, . . . ,pn}, and assume that formulas in AMA use only propositions in 
PUP and that M is the set of models over P U P, where for each i, exactly one of pi and pi is 
true at any time. Note that is in AMA and that T[{M, /)] is in M. We prove the lemma via 
straightforward induction on the structure of * — proving the result for literals, then for states, then 
for timehnes, and finally for AMA formulas. 

To prove the result for literals, we consider two cases (the third case of true is trivial). First, ^ 
can be a single proposition pi, so that = F\pi] = pi. Consider any model (M, I) ^ M. and let 
(M', I) = T[(M, /)]. The following relationships yield the desired result. 

* covers (M, /) iff for each i G 7, M[i\ assigns pi true (by definition of satisfiabihty) 

iff for each i G /, M' [i] assigns pi true (by definition of T) 

iff ^' = Pi covers T[(M, /)] (by definition of satisfiability) 

The second case is when ^ is a negated proposition ^<>pi — here, we get that ^' = pi. Let 
(M, I) ^ M and (M', /) = T[(M, /)]. The following relationships yield the desired result. 

* covers (M, /) iff for each « G 7, M[i] assigns pi false (by definition of satisfiability) 

iff for each « G /, M' [i] assigns pi true (by definition of T) 

iff = Pi covers T[{M, /)] (by definition of satisfiability) 

This proves the lemma for literals. 

16. Note that if $ were not required to be square then it is possible for s'jj^i to equal Sj — i.e., they could both equal 

Pk+l,l+l- 
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To prove the result for states, we use induction on the number k of Uterals in a state. The base 
case is when k = 1 (the state is a single literal) and was proven above. Now assume that the lemma 
holds for states with k or fewer hterals and let ^ = /i A • • • A Ik+i and (M, I) G M. From the 
inductive assumption we know that ^ = liA- ■ ■ Alk covers (M, I) iff covers T[{M, /)]. From 
our base case we also know that l^+i covers (M, /) iff covers T[{M, /)]. From these facts 

and the definition of satisfiability for states, we get that * covers (M, 7) iff A covers 
T[{M, I)]. Clearly F has the property that A F[lk+i] = F[^], showing that the lemma holds 
for states. 

To prove the result for timelines, we use induction on the number k of states in the timeline. The 

base case is when k = 1 (the timeline is a single state) and was proven above. Now assume that the 
lemma holds for timelines with k or fewer states. Let * = si; . . . ; s^+i and (M, [t, t']) € M with 
(M', [i, t']) = T[{M, [t, t'])]. We have the following relationships. 

* covers (M, [i, t']) iff there exists some t" E [t, t'], such that si covers (M, [t, t"]) and 

$ = S2; • ■ • ; Sk+i covers either (M, [t" ,t']) or (M, [t" + l,t']) 
iff there exists some t" € [t, t'], such that F[si] covers (M', [t, t"]) and 

covers either (M', [t", t']) or (M', [i" + 1, i']) 
iff covers (M', [i, t']) 

iff covers (M',[i,t']) 

Where the first iff follows from the definition of satisfiability; the second follows from our inductive 
hypothesis, our base case, and the fact that for I C [t, t'] we have T[{M,I)] = {M', I); the third 
follows from the definition of satisfiability; and the fourth follows from the fact that = 
F[^]. 

Finally, we prove the result for AMA^ formulas, by induction on the number k of timelines 
in the formula. The base case is when k = 1 (the formula is a single timeline) and was proven 
above. Now assume that the lemma holds for AMA formulas with with k or fewer timelines 
and let * = $1 A ■ ■ ■ A ^k+i (M, I) G M. From the inductive assumption, we know that 
^' = $1 A • • • A covers {M,I) iff covers T[(M, /)]. From our base case, we also 

know that ^k+i covers (M, I) iff covers T[{M, I)]. From these facts and the definition of 

satisfiability, we get that * covers (M, I) iff A F[<5k+i] covers T[{M, I)]. Clearly F has the 
property that A F[^k+i] = F[^], showing that the lemma holds for AMA~ formulas. This 

completes the proof. □ 



Appendix C. Hand-coded and Learned Definitions Used in Our Experiments 

Below we give the two sets of hand-coded definitions, HDi and HD2, used in our experimental 

evaluation. We also give a set of learned AMA event definitions for the same seven event types. The 
learned definitions correspond to the output of our A;-AMA learning algorithm, given all available 
training examples (30 examples per event type), with k = 3 and D = BN. All the event definitions 
are written in event logic, where -lOp denotes the negation of proposition p. 
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PlCKlJF{x,y,z) 



PUTD0WN(a;,i/,2;) 



STACK(w,x,y,z) 

\Jnstack{w, x,y,z) 

Moye{'w, x,y,z) 
Assemble(w, X, y, z) 
Disassemble(w, X, y, z) 



A 



A 



A 



( -tOx = y A -tOz = X A -tOz = yA 

SUPPORTED(t/) A -iOATTACHED(a;, z)A 

-.^ATTACHED(a;,y) A -i^SUPPORTS(x,y)A 

SUPPORTS(2;,1/)A 

-iOSUPPORTED(a;) A -i^ATTACHED(t/, z)A 
-.OSUPPORTS(t/, x) A -■OSUPPORTS(y, z)A 
-■OSUPPORTS(a:, z) A -■OSUPPORTS(2;, x) 
[ATTACHED(a:, y) V Attached(j/, z)] ; 
ATTACHED(a:,t/) A SUPPORTS(x,y)A 
-iOSUPPORTS(2;,t/)A 

-iOSUPPORTED(a;) A -iOATTACHED(t/, z)A 
-.OSUPPORTS(t/, x) A -.OSUPPORTS(y, z)A 
\ [ L -.OSUPPORTS(a;,2;) A -■OSUPPORTS(2;,a;) 

/ -lOa; = y A ^Oz = x A ^Oz = yA 

SUPPORTED(t/) A -iOATTACHED(a;, z)A 

ATTACHED(a;,2/) A SUPPORTS(x, y)A 
-iOSUPPORTS(2;,?y)A 

-i^SUPPORTED(a;) A -i^ATTACHED(t/, z)A 
-i^SUPPORTS(2/, x) A -i^SUPP0RTS(2/, z)A 

-■OSUPPORTS(a;, z) A -■OSUPPORTS(z, x) 
[ATTACHED(a;, y) V ATTACHED(t/, z)] ; 

-.^ATTACHED(x,y) A -i^SUPPORTS(x,y)A 
SUPP0RTS(z,1/)A 

-i^SUPPORTED(a;) A -i^ATTACHED(t/, z)A 
-.OSUPPORTS(2/, x) A -lOSUPPORTS^?/, z)A 

-^Oz = w A -^Oz = X A -^Oz = yA 
PutDown(w, X, y) A SUPP0RTS(2, y)A 
-.AttACHED(2;,2/) 

-lOz = w A -lOz = X A -lOz = yA 
PickUp(w, X, y) A SUPPORTS(z, y) A ATTACHED (2; , y) 

-lOy = z A [PickUp(w, X, 1/); PutDown(w, X, z)] 

PutDown(w, y, z) A{<} Stack(w, x, y, z) 

UnsTACK(w, X, y, z) A{<} PlCKUP(a;, y, z) 



Figure 12: The HDi event-logic definitions for all seven event types. 
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FickUp {x,y,z) 



A 



PUTD0WN(a;,?/,2) 



A 



/ -lOa; = y A -^Oz = x A -^Oz = yA 

SUPPORTED(j/) a -■OATTACHED(a:,z)A 

-■OATTACHED(a;,y) A -■OSUPPORTS(a;, y) A 
Supports (2, y) A Contacts (2;,?/) A 
-.OSUPPORTED(a;) A -.OATTACHED(y, z)A A{< m} 
-.OSUPPORTS(y,2;) A -.OSuPPORTS(y, 2;)A 
-.OSupPORTS(a;, 2;) A -■0SUPP0RTS(2;, a;) 
Attached (2;,?/) A Supports (a;, y) A 
-.OSUPPORTS(2;,y)A 

-.OSUPPORTED(a;) A -.OATTACHED(y, 2;)A 
-.OSUPPORTS(y, x) A -■OSUPPORTS(y, z)A 
\ [ L -■OSUPPORTS(a;,2;) A -■OSUPPORTS(2;, a;) 

/ -lOa; = y A ^Oz = x A ^Oz = yA 
SUPPORTED(y) A -.OATTACHED(a;, z) A 

Attached (x, y) A Supports (a;, y)A 

-.OSUPPORTS(2;, y)A 

-■OSUPPORTED(a;) A -■OATTACHED(y, z)A A{<^ni} 
-.OSUPPORTS(y,a;) A -.OSUPPORTS(y, 2;)A 
-.OSUPPORTS(a;,2;) A -.OSUPPORTS(2;, a;) 
-.OATTACHED(a;,y) A -lOSUPPORTS (a;, y) A 

Supports (2;, y) A Contacts (2;, y) A 

-.OSUPPORTED(a;) A -.OATTACHED(y, 2;)A 
-.OSUPPORTS(y,a;) A -.OSUPPORTS(y, 2;)A 
\ t [ -.OSUPPORTS(a;,2;) A -.OSUPPORTS(2;,a;) 



Figure 13: Parti of the HD2 event-logic definitions. 
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STACK{'w,x,y,z) 



A 



'Ul<ISTACK{w,x,y,z) 



\ 



-lOw = X A -lOy = w A -lOy = xA 
-lOz = w A -lOz = X A -lOz = yA 
SuPPORTED(a;) A -iOAttached(w,2/)A 

Attached (w, a:) A Supports (w, a;) A 
-i^SUPPORTS(t/,a;)A 
SUPP0RTS(2;,i/) A CONTACTS(2;,t/)A 
-.OAttached(2;,i/)A ^{<,m} 
-.OSUPPORTED(w) a -■OATTACHED(a;,i/)A 
-iOSUPPORTS(a:, w) A -iOSUPPORTS(a;, i/)A 

-i^SUPPORTS(w, y) A -i^SUPPORTS(y, w) 

-.OAttached(w,x) a -■OSuPPORTs(w,a;)A 
Supports a;) A Contacts (y, a;) A 
SUPP0RTS(2;, y) A C0NTACTS(2;, y)A 
-.<>AttaCHED(2;, y)A 

-iOSUPPORTED(w) a -iOATTACHED(a;,1/)A 

-.^SuPPORTS(a:, w) A -i^SuPPORTS(a;, y)A 
-.OSUPPORTS(w,t/) A -■OSUPPORTS(y, w) 

-lOw = X A -^Oy = w A -^Oy = xA 
-lOz = w A -lOz = X A -^Oz = yA 
SuPPORTED(a:) A -iOAttached(w, j/)A 

-.OAttached(w,x) a -■OSuPPORTs(w,a;)A 
Supports a;) A Contacts a;) A 
Supports (2;, J/) A CoNTACTS(2;,?y)A 
-.^Attached(2;,i/)a ^{<,m} 
-■OSupported(w) a -■OATTACHED(a;,y)A 
-iOSuPPORTS(a;, iv) A -iOSuPPORTS(a;, y)A 
-.OSupports(w;, y) A -■OSupports(j/, w) 
Attached(w, x) a Supports(w, a;)A 
-.^SuppORTs(j/,a;)A 
Supports(2;, y) A Contacts(2;, y)A 
-.OAttACHED(2;, y)A 

-iOSupported(w) a -iOATTACHED(a;, y)A 
-iOSUPPORTS(a:, w) A -iOSUPPORTS(a;, y)A 
-iOSUPPORTS(w,?y) A -iOSUPPORTS(?y, w) 



< 



MoyE{'w, x,y,z) 
Assemble(w, X, y, z) 
DiSASSEMBLE(u;, X, y, z) 



A 



A 



^Oy = z A [PickUp(w, X, 1/); PutDown(w, x, z)] 
PutDown(w, y, z) A{<} Stack(w, x, y, z) 
UnsTACk(w, X, y, z) A{<} PlCKUP(a;, y, z) 



Figure 14: Part n of the HD2 event-logic definitions. 
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A 



A 



SuppoRTED(t/) A Supports (2;, A 
CONTACTS(y, z) A -iOSUPPORTS(a:, y)A 
-iOATTACHED(a:, y) A -iOATTACHED(?y, z) 
SUPPORTED(y); ^ A 

Supported(i/) a SuPPORTS{a;, y)A 
ATTACHED(a;,y) A -■OSUPPORTS(2;, i/)A 
-iC>Contacts(2/, z) a -iOAttached(2/, z) 
Supported (y); 

SUPPORTED(t/) A ATTACHED(a;, ?y)A 

Attached (y, 2;) 
[Supported (?/) a ATTACHED(a:, y)] 

[SUPPORTED(y) A CONTACTS(t/, z)] ; 

[Supported (t/) a Attached(j/, z)] ; 
[Supported (?/) a ATTACHED(a:;, y)] 
Supported(2/) a Supports (2,1/) a 
CONTACTS(i/, z) a -■OSUPPORTS(a:, y)A 
-iOATTACHED(a;, y) A -.OATTACHED(t/, z) > A 
[SUPPORTED(t/) A SUPPORTS (2,2/)] ; 
[SUPPORTED(j/) a ATTACHED(a:, y)] 

[Supported (t/) a Supports (2,1/)] ; 

[SUPPORTED(t/) A ATTACHED(a:,t/)] ; 

Supported(2/) a Supports a 
ATTACHED(a;, y) A -.OSUPPORTS(z, y)A 
-.OCONTACTS(?y, z) A -.OATTACHED(?y, z) 



PUTD0WN(a;,j/,2;) 



A 



Supported(i/) a Supports (z, y) A ATTACHED(a;, y)A 
-.^SuPPORTS(2;, y) A -i^CONTACTS(y, z)A 

-iOATTACHED(t/, z) 

Supported(|/); 

SuppoRTED(t/) A SUPP0RTS(2;, y) A Contacts(2;, y)A 
-iOSUPPORTS(a;, y) A -i^ATTACHED(a;, y) 
Supported (y) A Attached ]; 

SUPP0RTED(2/) a ATTACHED(a;, y) A Attached(i/, z) ] 

Supported (1/) 



> A 



Figure 15: The learned 3-AMA definitions for PickUp(2;, y, z) and PutD0WN(2;, y, z). 
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r SUPP0RTED(2/) a ATTACHED(u;, x) a SUPP0RTS(2;, y) A CONTACTS(«/, z)A 1 
[ -■OSUPPORTs(a;,2/) A -iOSUPPORTs(2/,3;) A -iOCONTACTs(3;,2/) A -iOAttached(3;,2/) J ' 
[SUPP0RTED(2/)] ; 

r SUPPORTED(i/) A SUPP0RTED(2:) a SUPP0RTS(2/, x) a CONTACTS(a;, y) A CONTACTS(l/, z)A 
[ -.OSuPPORTS(a;, y) A ^OATTACHED(ji;, x) A -.OAttached(x, y) A -.OAttached(2/, z) 
[Supported(i/) a ATTACHED(w,',a;)] ; 1 
[Supported(2/) a Attached(x, y)]; > A 

[Supported(2/) a SuppoRTED(a;) A Supports(2/, x) a CoNTACTs(a;, y)] J 
[Supported(2/) a ATTACHED(w,a;)] ; 

[Supported(2/) a SUPPORTS(a;, y) A Attached(?<;, x) A ATTACHED(a;, y) A Attached(j/, z)] ; 
[SUPP0RTED(2/) a SUPPORTED(a;)SUPPORTS(2/,x)] 

[Supported(2/) a Attached(w, x)] ; 

[SUPP0RTED(2/) a SuPPORTED(a;) A SUPPORTS(a;, y) A SUPP0RTS(2/, x) A ATTACHED(?i;, x)] ; 

[Supported(2/) a Supported(3;) a Supports(2/, x)] 

[SUPP0RTED(2/) a ATTACHED(?i;, x) A SUPP0RTS(2, y) A C0NTACTS(2/, 2)] ; 1 

[Supported(2/) a Attached(2/, 2)] ; > A 

[Supported(2/) a Supported(x) a Supports(2/, x) a C0NTACTS(2/, 2)] J 

[SUPP0RTED(2/) a ATTACHED(i«, x) a SUPP0RTS(2, y) A C0NTACTS(2/, 2)] ; \ 
[SUPP0RTED(2/) a ATTACHED(to, x) A ATTACHED(2/, 2)] ; > A 

[Supported(2/) a Supported(x) a SUPP0RTS(2/, x)] J 

r SUPP0RTED(?/) a ATTACHED(to, x) A SUPP0RTS(2, y) A C0NTACTS(2/, 2)A 
|_ -iOSUPPORTS(x,?y) A ^OSUPPORTS(2/,x) A -iOCONTACTS(x,2/) A -iOATTACHED(x 

[Supported(i/) a Attached(w,x)] ; 
[Supported(i/) a Supported(x) a Supports(2/, x)] 

[SUPP0RTED(2/) a ATTACHED(?i;, x)] ; 1 
[SUPP0RTED(2/) a ATTACHED(«), x) a SUPP0RTS(2, y) A C0NTACTS(2/, 2)] ; > A 

[Supported(2/) a Supported(x)] J 
[Supported(2/) a Attached(i(;, x)] ; 1 

[SUPP0RTED(2/) a ATTACHED(to, x) A SUPP0RTS(2, y) A SUPPORTED(x)] ; > A 

[Supported(2/) a Supported(x)] J 

[SUPP0RTED(2/) a ATTACHED(tu,x)] ; 

r SUPP0RTED(2/) a C0NTACTS(2/, 2) A SUPP0RTS(2, 2/) A Supported(x) A 1 
[ -.OSUPPORTS(x, J/) A -nOATTACHED(x,2/) J 

[Supported(j/) a SUPPORTED(x)] 

Supported(j/); 

r SUPP0RTED(?/) A CONTACTS(l/, 2) A SUPPORTS(2,2/) A SUPPORTED(x)A 1 

|_ -iOSuppoRTs(x, y) A -iOAttached(x, y) A ^OAttached(2/, 2) J 
[Supported(2/) a Supported(x) a SUPP0RTS(2/, x)] 
[Supported(2/) a Attached(m;, x)] ; "I 
[Supported(2/) a C0NTACTS(2/, 2) A Supported(x)] ; > A 
[Supported(2/) a Supported(x) a Supported(2/)x] J 

[SUPP0RTED(2/) a ATTACHED(?i;, x)] ; 

[Supported(2/) a Supported(x) a Supports(2/, x)] ; 

r SUPP0RTED(2/) a SUPPORTED(x) a SUPPORTS(j/, x) a CONTACTS(x, y) A C0NTACTS(2/, 2)A 
[ -.OSUPPORTS(x, y) A -.OATTACHED(to, x) A ^OATTACHED(x, y) A -.OATTACHED(i/, 2) 

Supported(j/); 

Supported(?/) a Supported(x) a Supports(i/, x) a SUPP0RTS(2, y)A 1 

CONTACTS(x,J/) A C0NTACTS(?/,2) J' 
SUPP0RTED(2/) a SUPPORTED(x) a SUPPORTS(j/, x) a CONTACTS(x, y) A C0NTACTS(2/, 2)A 

-iOSupports(x,2/) a -iOAttached(«),x) a -iOAttached(x,2/) a -iOAttached(2/, 2) 



Figure 16: The learned 3-AMA definition for Stack(i(;, a;, j/, z). 
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SUPPORTED(a;) A SUPPORTED(iy) A SUPP0RTS(2/, x)h 
CONTACTS(a;, y) A CONTACTS(j/, z) A -iOSUPPORTS(«;, x)/\ 
-.OSUPPORTS(a;, y) A -.OAttached(w, x) A -.OATTACHED(a;, y) 
[SUPPORTED(a:) A SUPPORTED(y)] ; 

SUPPORTED(a) A SUPPORTED(j/) A ATTACHED(?i;, x) A SUPPORTS(2;, y)/\ 
C0NTACTS(2/, z) A Attached(«;, x) A -■OSUPPORTS(a;, y)h 
-.OSUPPORTS(i/,a:) A -.OCONTACTS(a;, 2/)A 
-iOATTACHED(a;, y) A -lO ATTACHED (y, z) 
[SUPPORTED(a;) A SUPPORTED(iy) A SUPPORTS(j/, x)] ; 

[SUPPORTED(a;) A SUPP0RTED(2/) a ATTACHED(«;, x) A ATTACHED(2/, z)] ; > 
[SUPPORTED(a;) A SUPP0RTED(?/) A ATTACHED(«;, x) A CONTACTS(?/, z)] J 
[SUPPORTED(x) a SUPP0RTED(?/) a SUPP0RTS(7y, x) A CONTACTS(2/, z)] ; 
[SUPPORTED(a:) A SUPPORTED(|y) A ATTACHED(y, z)] ; > 
[SUPPORTED(a:) A SUPPORTED(?y) A ATTACHED(w, x) A CONTACTS(y, z)] J 
[SUPPORTED(a) A SUPP0RTED(2/) A SUPPORTS(2/, x) A CONTACTS(x, y)] ; 
[SUPPORTED(a;) A SuppORTED(2/) a SuppORTS(2/, x) a ATTACHED(a;, y)] ; > 
[SUPPORTED(a:) A SUPPORTED(j/) A ATTACHED(w, x)] J 
[SUPPORTED(x) a SUPPORTED()y) A SUPPORTS(j/, x)] ; 
[SUPPORTED(x) a SUPPORTED(j/) a CONTACTS(y, z)]; > A 
[SUPPORTED(a;) A SUPP0RTED(?/) A ATTACHED(w, x)] J 
[SuPPORTED(a;) A SuppORTED(2/) a SUPP0RTS(2/,a:)] ; 
[SUPP0RTED(2:) a SUPPORTED(y) A ATTACHED(w, x)] ; 

" SUPPORTED(a:) A SUPP0RTED(2/) A ATTACHED(?i;, x) A SUPPORTS(2, y)/\ 
CONTACTS(y, z) A AttaCHED(m;, x) A -.O SUPPORTS (a;, y)/\ 
-■OSUPPORTS(i/,x) A -■OCONTACTS(a;,2/)A 
-iOATTACHED(a;, y) A -lO ATTACHED (y, z) 
SUPPORTED(a;) A SUPPORTED(i/) a SUPP0RTS(2/, x)h 
CONTACTS(a;, y) A CONTACTS(t/, z)/\ 
-iOSuppORTS(«;, a;) A -iOSUPPORTS(a;, y)/\ 
^ -■OAttached(«;, x) a -■OATTACHED(a;, y) 
[SUPPORTED(a:) A SUPPORTED(t/) A SUPPORTS(2/, a)] ; 
[SUPPORTED(j:) A SUPP0RTED(?/) A ATTACHED(w, x)] j 

[SuppoRTED(a;) A Supported(2/) a Supports(2/, x) a C0NTACTS(2/, z)] , . 

[SUPPORTED(a;) A SUPP0RTED(?/) A SUPPORTS(?/, x) A ATTACHED(?/, z)] ; > 

[SUPPORTED(x) a SUPPORTED(?y) A ATTACHED(w;, a:)] J 
[SUPPORTED(a;) A SUPPORTED(y) A SUPPORTS(i/, a;)] : 

SUPPORTED(a;) A SUPP0RTED(«/) A SUPP0RTS(?/, x) A ATTACHED(2/, z)A 

SuPPORTS(a;, y) A Attached(«;, x) A ATTACHED(a;, y) 
[SUPPORTED(a;) A SUPPORTED(y) A ATTACHED(w, a)] 

[SUPPORTED(a:) A SUPPORTED(|y)] ; ^ 
[SUPPORTED(a) A SUPPORTED(j/) A SUPPORTS(t/, x) A ATTACHED(«;, a;)] ; 
[Supported(x) a Supported(?/) a Supports(«;, x) a ATTACHED(«;, x)] J 

[SUPPORTED(x) a SUPP0RTED(?;) A SUPPORTS(?/, x)] ; 

[SUPPORTED(a:) A SUPPORTED(?y) A SUPPORTS(w, x) A ATTACHED(w, x)] ; 
[SUPPORTED(x) a SUPPORTED(t/) A ATTACHED(w, x)] 
[SUPPORTED(a;) A SUPPORTED(i/) a SUPPORTS(2/,a)] ; 
SUPPORTED(a;) A SUPPORTED(i/) a C0NTACTS(?/,2:)A 

-lO Supports (a, y) A -iOAttached(x, y) A -iOAttached(j/, z) 

[SUPPORTED(a) A SUPPORTED(j/)] 



> A 



> A 



> A 



Figure 17: The learned 3-AMA definition for Unstack(u;, x, y, z). 
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Supported (a:) A Supports (|/,rE) A Contacts (y,2;) A 
-■OSUPPORTS(-u;,a;) A -■OSUPPORTS(2;, a;) A -■OCONTACTS(a;, A 
-.OATTACHED(tt;,2;) A -.OATTACHED(y, A -. O ATTACHED (x, z) 

Supported (x); 

Supported (a::) A Supports (2;, A Contacts (a:;, z) A 
-■OSUPPORTS('»;,a;) A -■OSUPPORTS(y, a;) A -■OCONTACTS(y, a;) A 
-■OATTACHED('»;,a;) A -■OATTACHED(y, a;) A -■OATTACHED(a;, ^) 
[Supported (2;) A Supports {y, x)] ; 1 
[Supported (a;) A Attached {w,x)]; > a 
Supported (a;) I 
Supported (a;); 

[Supported (a;) A Attached {w,x) a Attached {x,z)]\ > a 
Supported (a;) 

[Supported (a;)] ; 1 
[Supported (a;) A Attached {x,z)]; > A 
[Supported (a;) A Contacts {x,z)] J 
Supported (a;); 

[Supported (a;) A Attached {w,x) a Supports {w,x)]\ > a 
Supported (a;) 
Supported (a;); 

[Supported (a;) A Attached (u;, a;) a Attached {y,x)]; > A 
Supported (a:) 

[Supported (a;) A Contacts (y, a;)] ; 
[Supported (a;) A Attached (y, a;)] ; 
Supported (a;) 



Figure 18: The learned 3-AMA definition for Moye{w, x, y, z). 
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-.OSUPPORTED(a;) A -.OSUPPORTS(2;, y) A -.OSUPPORTS(?/, x) A 

-.OCONTACTS(2;, y) A -.OCONTACTS(2;, y)A 
-.O Attached (-w, a;) A -■OAttached(^;, y) 
true; 

SUPPORTED(a:;) A SUPPORTED(y) A SUPPORTS (2, y) A 
Supports (y, a;) A Contacts (x,y) A 
CoNTACTs(^, y) A -lO Attached («;,y) 

-■OSUPPORTED(a;) A -■OSUPPORTS(2;, y) A -■OSUPPORTS(y, 2;) A 
-.OCONTACTS(a;,y) A -.OCONTACTS(2;, y)A 
-.OATTACHED('u;,x) a -■OATTACHED(2;,y) 

Attached ('»;,y); 
Supported (y) 
true; 

[Supported (y) a -.OAttached(w, x) a -.OAttached(2, y)] ; ^ A 
Supported (y) 
true; 

[Supported (y) a Attached {z,y)]- V A 
[Supported (y) a Contacts (2;, y)] 
true; 

[SuppoRTED(y) A Supports (2, y)C0NTACTS (2, y) A Attached (w,^)] 
Supported (y) 
true; 

[Supported (y) a Attached (w,y) Attached (2, y)] ; 
Supported (y) 

Figure 19: The learned 3-AMA definition for Assemble (lo, x, y, z) 
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SUPPORTED(a;) A SUPPORTED(?y) A SUPPORTS(i/, a;) A SUPPORTS(z, y)A 

CONTACTS(a;, y) A CONTACTS(2;, y) A -i^SUPPORTS(ui, x)/\ 
-.OSUPPORTS(w,y) A -■OSUPPORTS(a:,2/) A -■OAttached(x, w)A 
-iOAttached(w, y) A -iOATTACHED(a;, y) A -iOAttached(z, y) 

SUPPORTED(y); 

Supported(2/) a -i^Supported(x) a -iOSupports(w,x)a 
-.OSUPPORTS(z, y) A -.OSUPPORTS(?y, x) A -.OCONTACTS(a;, y)A ; 
-i^CONTACTS(^,t/) A -i^ATTACHED(a;, w) A -iOAttached(2;, y) 

[Supported (a;) A Supported ; ] 
SUPPORTED(a;) A Supp0RTED(i/) A SuPPORTS(w,a;)A " 
Supports (2;, y) A Contacts (z, y) A ATTACHED(a;, w) ' [ 

Supported(i/) 

SUPPORTED(a;) A Supp0RTED(i/) a SuPPORTS(2;,?y)A 1 
Supports (j/, x) A Contacts (x, y) A Contacts (z, y) ' 

[Supported (a;) A Supported(2/) A SuPPORTs(y, a;) A ATTACHED(a;, y)] ; 

Supported(i/) 

[SUPPORTED (a:) A SUPPORTED(y) A SUPPORTS(j/, x) A CONTACTS(2;, y)] ; 
SUPPORTED(a;) A SUPPORTED(y) A SUPP0RTS(a;,2/)A 
SUPPORTS(?y, z) A ATTACHED(a;, y) A Attached(2;, y) ' 

SUPPORTED (y) 

[Supported (a;) A Supported(2/) a SuppoRTs(y,a;)] ; 

SUPP0RTED(a;) A Supp0RTED(i/) a SUPPORTS(a;, y)A 

SuPPORTS(t/, z) A ATTACHED(a;, y) A Attached(2;, y) A ATTACHED(a;, w) 
Supported (1/) 

Supported(i/); 1 
[SuPPORTED(t/) A AttACHED(w, y) A AttACHED(z, y)]; > A 
Supported (y) I 

SUPPORTED(i/); I 
[SUPPORTED(t/) A SUPP0RTS(w, y) A ATTACHED(w, y)] ; > 
SUPPORTED(y) J 



Figure 20: The learned 3-AMA definition for DISASSEMBLE (-w, x, y, z). 



446 



Learning Temporal Events 



References 

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the Eleventh 
International Conference on Data Engineering, pp. 3-14. 

Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 
26(11), 832-843. 

Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and 

Computation, 75, 87-106. 

Bacchus, R, & Kabanza, F. (2000). Using temporal logics to express search control knowledge for 
planning. Artificial Intelligence, 16, 123-191. 

Bobick, A. F., & Ivanov, Y. A. (1998). Action recognition using probabilistic parsing. In Proceed- 
ings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 
pp. 196-202, Santa Barbara, CA. 

Borchardt, G. C. (1985). Event calculus. In Proceedings of the Ninth International Joint Conference 

on Artificial Intelligence, pp. 524-527, Los Angeles, CA. 

Brand, M. (1997a). The inverse Hollywood problem: From video to scripts and storyboards via 
causal analysis. In Proceedings of the Fourteenth National Conference on Artificial Intelli- 
gence, pp. 132-137, Providence, RI. 

Brand, M. (1997b). Physics-based visual understanding. Computer Vision and Image Understand- 
ing, 55(2), 192-205. 

Brand, M., & Essa, 1. (1995). Causal analysis for visual gesture understanding. In Proceedings of 
the AAAI Fall Symposium on Computational Models for Integrating Language and Vision. 

Brand, M., Oliver, N., & Pentland, A. (1997). Coupled hidden Markov models for complex action 
recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision 
and Pattern Recognition. 

Cohen, P. (2001). Fluent learning: Elucidating the structure of episodes. In Proceedings of the 
Fourth Symposium on Intelligent Data Analysis. 

Cohen, W. (1994). Grammatically biased learning: Learning logic programs using an expUcit an- 
tecedent description lanugage. Artificial Intelligence, 68, 303-366. 

Cohen, W., & Hirsh, H. (1994). Learning the CLASSIC description logic: Theoretical and experimen- 
tal results. In Proceedings of the Fourth International Conference on Principles of Knowledge 
Representation and Reasoning, pp. 121-133. 

De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. Machine Learning, 26, 99-146. 

Dehaspe, L., & De Raedt, L. (1996). DLAB: A declarative language bias formahsm. In Proceedings 
of the Ninth International Syposium on Methodologies for Intelligent Systems, pp. 613-622. 

Fikes, R., & Nilsson, N. (1971). STRIPS: A new approach to the application of theorem proving to 
problem solving. Artificial Intelligence, 2(3/4). 

Hoppner, F. (2001). Discovery of temporal patterns — ^Learning rules about the qualitative behaviour 
of time series. In Proceedings of the Fifth European Conference on Principles and Practice 
of Knowledge Discovery in Databases. 

447 



Fern, Givan, & Siskind 



Kam, P., & Fu, A. (2000). Discovering temporal patterns for interval-based events. In Proceedings 
of the Second International Conference on Data Warehousing and Knowledge Discovery. 

Klingspor, V., Morik, K., & Rieger, A. D. (1996). Learning concepts from sensor data of a mobile 
robot. Artificial Intelligence, 23(2/3), 305-332. 

Lang, K., Pearknutter, B., & Price, R. (1998). Results of the Abbadingo one DFA learning com- 
petition and a new evidence-driven state merging algorithm. In Proceedings of the Fourth 
International Colloquium on Grammatical Inference. 

Lavrac, N., Dzeroski, S., & Grobelnik, M. (1991). Learning nonrecursive definitions of relations 
with LINUS. In Proceedings of the Fifth European Working Session on Learning, pp. 265- 
288. 

Mann, R., & Jepson, A. D. (1998). Toward the computational perception of action. In Proceedings 
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 
794-799, Santa Barbara, CA. 

Mannila, H., Toivonen, H., & Verkamo, A. I. (1995). Discovery of frequent episodes in sequences. 
In Proceedings of the First International Conference on Knowledge Discovery and Data Min- 
ing. 

Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18(2), 517-42. 

Morales, E. (1997). Pal: A pattern-based first-order inductive system. Machine Learning, 26, 221- 
252. 

Muggleton, S. (1995). Inverting entailment and Progol. Machine Intelligence, 14, 133-188. 

Muggleton, S., & Feng, C. (1992). Efficient induction of logic programs. In Muggleton, S. (Ed.), 
Inductive Logic Programming, pp. 281-298. Academic Press. 

Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal 

of Logic Programming, 19/20, 629-679. 

Pinhanez, C, & Bobick, A. (1995). Scripts in machine understanding of image sequences. In 
Proceedings of the AAAI Fall Symposium Series on Computational Models for Integrating 
Language and Vision. 

Plotkin, G. D. (1971). Automatic Methods of Inductive Inference. Ph.D. thesis, Edinburgh Univer- 
sity. 

Regier, T. P. (1992). The Acquisition of Lexical Semantics for Spatial Terms: A Connectionist Model 
of Perceptual Categorization. Ph.D. thesis. University of California at Berkeley. 

Roth, D., & Yih, W. (2001). Relational learning via propositional algorithms: An information extrac- 
tion case study. In Proeedings of the Seventeenth International Joint Conference on Artificial 
Intelligence. 

Shoham, Y. (1987). Temporal logics in AI: Semantical and ontological considerations. Artificial 
Intelligence, 55(1), 89-104. 

Siskind, J. M. (2000). Visual event classification via force dynamics. In Proceedings of the Seven- 
teenth National Conference on Artificial Intelligence, pp. 149-155, Austin, TX. 

Siskind, J. M. (2001). Grounding the lexical semantics of verbs in visual perception using force 
dynamics and event logic. Journal of Artificial Intelligence Research, 15, 31-90. 

448 



Learning Temporal Events 



Siskind, J. M., & Morris, Q. (1996). A maximum-likelihood approach to visual event classifica- 
tion. In Proceedings of the Fourth European Conference on Computer Vision, pp. 347-360, 
Cambridge, UK. Springer- Verlag. 

Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49-100. 

Yamoto, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using 
hidden Markov model. In Proceedings of the IEEE Conference on Computer Vision and 
Pattern Recognition, pp. 379-385. 



449 



